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Abstract 

Over  the  past  several  years  many  contributions  have  been  made  to  the  problem  of  detect¬ 
ing  underwater  acoustic  signals  and  estimating  signal  parameters  such  as  time-of-arrival, 
frequency-of-arrival,  angle-of- arrival,  etc.  In  the  previous  research  effort,  the  spatial  detec¬ 
tion  problem  was  re-examined  from  a  decision-directed  point  of  view,  and  a  methodology 
was  presented  to  estimate  the  time-varying  joint  prior  spatial  distribution  of  the  signal.  In 
the  current  research  effort,  attention  is  directed  toward  improving  detection  of  temporal 
signals  according  to  the  frequency  content  of  the  signal.  The  source  environment  char¬ 
acterizes  the  presence  or  absence  of  the  various  sources  (both  signals  and  noise)  as  they 
evolve  in  frequency  and  time.  It  may  not  be  assumed  that  this  environment  is  stationary 
in  either  frequency  content  or  time,  hence  it  is  necessary  to  characterize  and  track  the 
nonstationary  behavior  of  this  environment.  Knowledge  of  the  source  environment  may  be 
used  to  effect  on-line  adaptation  of  the  decision  strategies  used  to  detect  hostile  targets, 
and  has  potential  for  modifying  the  collection  procedures.  The  key  result  of  this  report 
is  the  development  of  a  decision-directed  empirical  Bayes  decision  rule  which  permits  a 
nonstationary  prior  marginal  probability  distribuiton  to  be  estimated  (i.e.,  tracked)  based 
upon  the  time- varying  frequency  content  of  the  signal. 
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Chapter  1 


Introduction 

1.1  Problem  Statement 

The  problem  addressed  by  this  investigation  is  a  continuation  of  the  issues  addressed  by 
[34],  and  represents  a  natural  extension  of  that  research.  The  practical  problem  which 
motivates  this  study  is  that  of  detecting,  localizing,  and  classifying  foreign  submarine  ac¬ 
tivity  by  means  of  acoustic  surveillance  sensors  located  at  strategic  positions,  particularly 
in  locations  where  noise  sources  are  present  which  emit  significant  energy  in  the  frequency 
bands  of  interest  (e.g.,  ice  noise).  These  noise  sources  often  occur  as  narrowband  “tonals” 
that  are  difficult  to  separate  from  legitimate  targets.  Consequently,  the  target  detection 
problem  is  greatly  complicated  by  their  presence,  and  an  important  problem  is  to  develop 
procedures  for  distinguishing  real  targets  from  pseudo  targets.  Due  to  the  nature  of  the 
signals  involved,  much  harmonic  information  is  generated  which  may  allow  the  system 
to  identify  source  characteristics.  For  example,  underwater  sources  radiate  narrow-  and 
broad-band  acoustic  energy  due  to  propulsion  systems,  auxiliary  machinery,  and  hydro- 
dynamic  effects  [1].  These  effects  comprise  what  is  generally  referred  to  as  the  source 
signature.  Since  these  effects  occur  at  different  levels  and  frequencies  depending  on  the 
actual  shape,  size,  and  operating  mode  of  the  vessel,  the  observed  signal  components  can 
be  compared  with  a  set  of  known  target  characteristics  for  identification  purposes.  Hence, 
it  is  very  advantageous  to  estimate  the  spectrum  of  the  various  received  signals  for  identifi¬ 
cation  purposes.  For  this  research  effort,  we  concentrate  not  on  the  usual  spectral  power  or 
energy  estimation  methods,  but  on  the  application  of  the  above  analysis  for  the  estimation 


of  the  probability  of  spectral  signal  presence  as  a  method  for  more  robust  identification 
procedures. 

The  major  thrust  of  this  investigation  is  the  development  of  adaptive  methods  for 
estimating  the  signal  environment  as  characterized  by  the  prior  probabilities  of  signal 
content.  These  methods  lead  to  the  structuring  of  decision-directed  detection  procedures 
that  are  capable  of  real-time  adaptation  to  a  changing  environment  (i.e.,  nonstationarv 
priors). 

Review  of  Previous  Research 

The  investigation  conducted  under  the  previous  contract  (N00039-85-C-0223)  [34,35.33] 
provides  encouraging  results  regarding  adaptive  classification  of  signals.  That  work  con¬ 
centrates  primarily  on  the  classification  of  the  spatial  frequency  content  of  signals,  and 
yields  a  decision-directed  empirical  Bayes  methodology  to  adaptively  adjust  the  decision 
rule  to  account  for  the  current  joint  prior  distribution  of  signals  of  interest.  The  result¬ 
ing  detector  represents  a  potential  alternative  to  classical  Neyman-Pearson,  minimax.  and 
Bayesian  decision  rules. 

The  algorithm  central  to  the  success  of  this  adaptive  decision  rule  is  a  recursive,  non¬ 
linear,  exact  minimum  mean  square  estimator  proposed  by  Segall  [29]  and  first  applied  to 
decision-directed  detection  by  Stirling  [32].  This  algorithm  was  extended  to  multivariate 
detection  in  [34.33], 

The  spatial  detection  problem  is  that  of  detecting  threat  signals  from  spatially  separated 
acoustic  surveillance  sensors.  The  signal  is  often  embedded  deeply  in  the  noise  background, 
and  standard  decision  criteria  yield  marginal  results.  In  order  to  apply  the  Bayes  formula, 
one  must  know  the  a  priori  distribution  of  the  signal  with  respect  to  the  spatial  coordinates 
of  the  detection  system.  Unfortunately,  this  is  rarely  the  case  and.  furthermore,  even 
if  known,  this  distribution  would  likely  be  non-stationarv  (i.e..  it  would  evolve  in  time 
and  space),  since  the  threat  environment  is  subject  to  change.  Errors  in  the  a  prion 
distribution  may  seriously  degrade  performance  of  a  Bayes  detector  in  this  environment. 
Furthermore,  in  a  weak  signal-to-noise  environment,  a  constant  probability  of  false  alarm 
may  result  in  the  probability  of  a  missed  detection  being  excessively  large,  and  decision 


rules  based  upon  a  specified  constant  false-alarm  probability  may  be  inadequate.  Also, 
minimax  decision  rules  are  unduly  pessimistic  in  this  environment,  and  may  not  lead  to 
acceptable  performance. 

In  view  of  these  issues,  the  empirical  Bayesian  approach,  is  explored  in  [34],  and  the 
prior  distribution  is  estimated  from  the  data.  Empirical  Bayes  procedures  are  well  known. 
[22],  but  traditionally  deal  almost  exclusively  with  the  stationary  case,  wherein  the  prior 
distribution  is  constant.  For  this  case  there  exist  asymptotically  sub-minimax  decision 
rules  that  approach  the  Bayes  envelope.  Our  problem  is  somewhat  more  complex,  how¬ 
ever.  since  we  must  allow  this  distribution  to  be  time-varying.  Throughout  the  course 
of  a  collection,  the  target  environment  is  subject  to  change  as  sources  move  through  the 
collection  region,  enter  and  exit  sensor  beams,  etc.  One  may  not  be  able  to  wait  until  all 
the  data  are  received  to  make  decisions.  In  fact,  a  real-time  decision  making  capability  is 
necessary  and.  critically,  it  must  be  able  to  adaptively  adjust  the  structure  of  the  decision 
rule  to  ensure  that  the  decisions  are  being  made  as  accurately  as  possible.  These  con¬ 
straints  on  the  empirical  Bayes  procedure  are  severe,  and  render  classical  '‘feed-forward" 
decision  processes  inadequate  to  deal  with  the  tracking  capability.  An  alternative  ''feed¬ 
back'’  approach  is  developed  in  this  analysis.  Such  a  decision- directed  approach  represents 
a  significant  departure  from  classical  empirical  Bayes  procedures,  but  fits  well  into  the 
general  class  of  adaptive  detection  procedures  such  as  generalized  likelihood  ratio  tests. 

Emphasis  of  Current  Research 

The  above  discussion  for  spatial  frequency  has  a  direct  analog  for  temporal  frequency,  where 
the  decision  problem  consists  of  determining  temporal  frequency  structure  rather  than 
spatial  frequency  structure.  In  this  case  also,  the  empirical  Bayes  approach  is  applicable, 
arid  may  be  used  to  estimate  the  prior  distribution  on  the  frequency  content  of  the  signal 
environment. 

An  additional  issue  may  be  addressed  in  the  temporal  frequency  context.  Consider  a 
collection  scenario  m  an  environment  where  the  signal  is  corrupted  by  '‘tonal"  noise,  such  as 
that  encountered  in  an  environment  where  the  movement  of  ice  generates  noise  tonals  that 
are  within  the  spectrum  occupied  by  threat  emitters.  In  such  situations,  a  key  problem 


is  that  of  classifying  the  signal  as  a  threat  or  as  benign  (i.e.,  of  natural  origin).  Thus, 
the  problem  becomes  one  of  not  only  detecting  the  signal,  but  of  recognizing  structural 
characteristics  that  serve  to  separate  threat  and  benign  signals. 

To  address  this  problem,  a  robust  method  of  spectral  estimation  is  proposed  and  exam¬ 
ined.  This  method  employs  a  decision-directed  empirical  Bayes  decision  rule  to  estimate 
the  time- varying  prior  probability  of  signal  occurrence  in  a  given  FFT  (fast  Fourier  trans¬ 
form)  bin  of  a  signal.  The  a  priori  probability  of  spectral  content  for  each  bin  is  modeled  as 
a  finite-state  Markov  chain  and  the  state  of  this  chain  is  estimated  by  obtaining  the  Doob 
decomposition  of  the  discrete-time  point  process  representing  the  decisions.  A  generalized 
empirical  Bayes  likelihood  ratio  test  is  used  as  the  detector  which  feeds  decisions  to  the 
state  estimator. 

The  present  research  is  focused  upon  the  application  of  decision-directed  empirical 
Bayes  methods  to  estimate  the  probability  of  spectral  energy  independently  for  discrete 
frequencies.  This  approach  does  not  employ  information  about  any  harmonic  dependencies 
which  a  practical  signal  would  reasonably  be  expected  to  possess.  Therefore,  potentially 
more  robust  detection  schemes  would  take  advantage  of  the  harmonic  content  of  the  sig¬ 
nal  to  improve  the  detector  performance.  One  approach  which  seems  viable  is  to  use 
conditional  factorization  of  the  joint  probability  density  function  in  a  distributed-network 
technique  similar  to  that  done  by  Stirling  and  Swindlehurst  [34].  Another  approach  which 
may  achieve  improved  performance  is  to  provide  for  feedback  of  post-detection  classifica¬ 
tion  decision  information  to  sensitize  the  harmonicallv  related  scalar  detectors. 


1.2  Summary  of  Technical  Approach 

Decision-Directed  Analysis 

The  philosophy  of  decision-directed  procedures  is  illustrated  in  Figure  1.1.  In  this  figure, 
the  outputs  of  the  signal  processing  (including  detection)  portion  of  this  diagram  may  be 
used  to  generate  an  estimate  of  the  signal  model  (including  both  deterministic  and  prob¬ 
abilistic  aspects  of  the  model)  and  feed  it  back  into  the  spatio-temporal  signal  processing 
block  to  modify  the  structure  of  the  collection/detection  system.  There  are  a  number  of 


possible  feedback  strategies  that  may  be  adopted,  including  the  following. 


Figure  1.1:  Signal  Processing  Flow-Feedback  Among  Subsystems 

1.  Empirical  Bayes.  The  most  obvious  feedback  strategy  is  to  modify  the  decision  rules 
in  the  empirical  Bayes  sense.  Such  a  strategy  provides  estimates  of  the  prior  distri¬ 
bution.  based  upon  the  past  decisions,  which  modify  future  behavior  of  the  decision 
rule.  Key  issues  associated  with  this  procedure  are  the  stability  and  robustness  of  the 
estimator  and  the  ability  to  track  time-varying  priors.  These  issues  will  be  elucidated 
throughout  this  report  and  a  multivariate  empirical  Bayes  detection  procedure  will 
be  introduced. 

2.  Control.  A  second  type  of  feedback  is  to  use  the  detected  signals  and  associated 
estimated  probabilities  to  actually  control  or  modify  the  spatio-temporal  structure  of 
the  sensor  arrays.  For  example,  array  allocation  strategies,  beamsteering,  computer 
resource  allocation,  etc.,  are  all  possible  control  strategies  which  modify  or  tune  the 
system  to  improve  performance.  Technical  possibilities  will  be  explored  but  not  fully 
developed  in  this  report. 

3.  Modeling.  A  third  type  of  feedback  is  to  use  the  detected  and  estimated  signal 
structure  to  modify  the  channel  model  to  perform  signal  enhancement  (e.g.,  channel 
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equalization)  to  shape  the  received  signal  for  more  reliable  detection  and  classifica¬ 
tion.  Such  techniques  are  not  being  considered  under  this  current  effort. 

The  Empirical  Bayes  Approach 

The  espousal  of  the  Bayesian  approach  implies  that  the  unknown  state  of  nature  is  de¬ 
scribed  by  a  prior  probability  distribution.  The  empirical  Bayes  decision  problem  is  for¬ 
mulated  exactly  the  same  way  as  a  standard  Bayes  problem,  except  that  prior  is  unknown 
and  must  be  estimated  from  the  available  data.  Suppose  that  a  particular  decision  problem 
occurs  repeatedly  and  independently,  with  the  same  unknown  prior  distribution  through¬ 
out  the  experiment.  (Such  a  situation  obtains  when  we  attempt  to  detect  a  weak  or  fading 
target  in  a  noisy  environment  or  in  the  presence  of  tonal  noise).  Under  this  supposition,  it 
is  logical  to  perform  analysis  on  the  observation  in  an  attempt  to  discover  the  prior  distri¬ 
bution.  We  may  define  an  empirical  decision  procedure  to  be  a  sequence  of  decision  rules 
which  learn  or  adapt  from  previous  experiments  and  “converge’’  in  some  sense  to  the  true 
prior.  Robbins  and  related  researchers  [21,22,24,15]  describe  the  theory  of  asymptotically 
optimal  decision  procedures  and  demonstrate  that  such  procedures  converge  to  the  Bayes 
envelope  function  as  the  number  of  experiments  increases.  These  results  are  based  upon 
the  assumption  that  the  prior  distribution  is  constant  throughout  the  experiment. 

A  key  aspect  of  this  study  is  that  the  prior  distribution  is  not  only  unknown,  it  is  subject 
to  change  as  well.  This  change  in  the  assumptions  about  the  prior  constitutes  a  significant 
difficulty,  since  the  classical  convergence  results  may  no  longer  be  valid.  In  the  extreme 
case  where  the  changes  are  completely  unpredictable,  it  is  likely  true  that  the  empirical 
Bayes  approach  is  doomed  to  failure,  and  some  other  decision  criteria  should  be  evoked. 
If  the  changes  can  be  modeled,  however,  then  there  is  hope  that  the  prior  distribution 
may  be  “tracked”  in  a  manner  entirely  analogous  to  the  way  a  moving  target  is  tracked 
using,  say,  a  Kalman  filter.  The  key  assumption,  therefore,  is  the  model  that  is  used  to 
describe  the  evolution  of  the  distribution.  In  this  study  we  develop  such  a  model,  cast  in  a 
state-space  environment  (analogous  to  a  differential  system)  and  present  a  recursive  state 
estimator  (analogous  to  a  Kalman  filter)  to  estimate  the  time- varying  prior  distribution. 

To  accomplish  the  goal  of  tracking  the  changing  distribution,  we  evoke  a  particular  type 


of  empirical  Bayes  decision  rule,  which  may  be  termed  a  feedback,  or  decision-directed  rule. 
Such  a  rule  uses  past  decisions  to  estimate  the  prior,  rather  than  past  data  directly.  To 
illustrate  the  difference  between  a  feedback  and  a  feed-forward  rule,  consider  the  following 

simple  decision  problem:  Suppose  we  observe  a  signal  y(t),  for  t  =  0.  1.2 . and  at  each 

sample,  the  signal  may  be  white  noise  with  mean  zero  or  with  mean  b  ^  0.  The  decision 
problem  may  be  formulated  as  a  hypothesis  testing  problem  of  the  form 


H0  :  y(t)  =  n(t) 

Hi  :  y(t)  =  b+  n(t) 


t  =  0. 1,2 _ 


where,  say,  n(t)  is  an  white  Gaussian  noise  sequence  of  known  variance,  and  6  is  a  known 
constant.  If  the  prior  distribution  is  known,  the  problem  admits  the  well-known  Bayes 
solution.  If  not,  the  empirical  Bayes  approach  is  to  estimate  this  prior.  Suppose  we 
consider  a  feed-forward  approach.  Let  7r (f)  =  P{H\  true  at  time  f)  If  t (/)  is  a  constant, 
they  we  may  form  an  estimate  at  time  t  as 

lu  1=1 

which  will  converge  to  the  true  value  as  t  — ♦  oo.  If  n(t)  is  not  constant,  however,  such 
simple  procedures  are  inadequate.  If  we  pursue  the  standard  empirical  Bayes  approach, 
we  must  postulate  an  equation  of  evolution  for  n(t),  say  n (t  +  1)  =  /[*■(<),<]  4-  w(t),  a 
stochastic  difference  equation.  We  are  also  required  to  ensure  that  0  <  rr(  f )  <  1  for  all  t. 
Once  this  model  is  in  place,  an  appropriate  estimation  rule  must  be  developed,  which  will 
in  general  be  nonlinear. 

Alternatively,  we  may  pursue  a  feedback  approach,  and  deal  not  with  the  past  values 
of  y(t),  but  with  past  decisions.  We  may  model  the  sequence  of  past  decisions  as  a  point 
process ,  {-V},  where  N(t)  =  1  if  the  hypothesis  Hi  is  selected,  and  N(t)  =  0  otherwise. 
Dealing  with  {iV}  is  much  simpler  than  dealing  with  the  original  sequence  {y}.  As  we  shall 
illustrate  in  this  study,  it  is  indeed  possible  to  formulate  physically  meaningful  procedures 
to  describe  the  evolution  of  the  probability  structure  of  ,V(f),  which  will  lead  to  an  estimate 
of  the  prior  distribution.  The  point  process  approach  has  the  great  advantage  that  it  leads 
to  a  recursive  estimator  that  is  easily  implemented  and  can  be  analyzed  theoretically. 

Decision-directed  detection  has  been  used  in  many  contexts  [20, 28. 17, S, 14, 10, 16]  which 
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deal  with  the  problem  of  simultaneously  detecting  a  signal  and  estimating  its  parame¬ 
ters  using  decision-directed  schemes.  Other  researchers  [6,26,12,27,13]  provide  analyses 
of  decision-directed  procedures  for  estimating  the  prior  distribution,  i.e.,  decision-directed 
empirical  Bayes  procedures.  In  [30,32]  nonstationary  empirical  Bayes  procedures  were  first 
introduced,  and  these  were  combined  with  signal  parameter  estimation  in  [30,31]. 

Control  Strategies 

The  signal  processing  information  flow  illustrated  in  Figure  1.1  provides  the  capability  of 
feeding  back  signal  structure  information  to  the  sensor  arrays  as  well  as  to  the  detection 
block.  With  the  empirical  Bayes  approach,  only  the  prior  distribution  of  the  target  oc¬ 
currences  is  adapted,  but  the  statistical  description  of  the  signal  itself  or  the  noise  is  not 
modified.  Furthermore,  the  allocation  of  the  sensors  is  not  adjusted  in  any  way  to  accom¬ 
modate  either  the  probability  structure  of  the  signal  environment  (i.e.,  the  occurrences  of 
threat  signals)  or  of  the  structure  of  the  signal  and  noise.  Thus,  in  addition  to  empirical 
Bayes  decision  strategies,  there  arises  the  potential  for  spatio-temporal  adaptive  control  of 
the  collectors  and  of  the  detector,  including  the  following: 

•  Signal  extraction  from  ice  noise.  In  collection  environments  where  the  noise  environ¬ 
ment  contains  impulsive  noise,  the  probability  distribution  of  the  signal  with  respect 
to  its  frequency  content  may  be  used  to  discriminate  between  threat  sources  and 
benign  (i.e.,  natural)  sources. 

•  Mixture  process  modeling.  Impulsive  noise  fields  may  be  modeled  as  mixture  pro¬ 
cesses,  wherein  the  noise  process  is  distributed  as  a  convex  linear  combination  of. 
say,  Gaussian  noise.  Decision-directed  procedures  may  be  used  for  estimating  the 
mixture  parameter. 

•  Resource  allocation.  Knowledge  of  the  multivariate  signal  occurrence  probability  is 
potentially  valuable  in  making  decisions  concerning  the  use  of  available  collection  re¬ 
sources.  For  example,  it  may  be  possible  to  perform  adaptive  beamsteering  to  cover 
regions  of  high-threat  probability  more  thoroughly,  and  thereby  positively  establish 


1-8 


the  threat  status  of  the  signal.  Additionally,  this  knowledge  may  motivate  a  deci¬ 
sion  to  employ  additional  collection  resources  that  may  be  held  in  reserve  (due.  for 
example,  to  power  limitations)  for  high-threat  situations. 


•  Feedback  of  classification  decisions.  Once  sources  have  been  classified  as  to  their 
frequency  content  or  signature,  this  information  can  be  supplied  to  the  detectors  to 
improve  detection  performance  in  the  harmonically  related  bins  represented  in  the 
classification  decision.  Tonal  noise  which  is  nonharmonic  would  be  classified  as  such 
and  those  classes  without  harmonic  structure  could  be  given  a  lower  weighting  in 


sensitizing  the  detectors. 

1.3  Summary  of  Results  and  Conclusions 

Technical  Results 

The  results  of  this  study  include: 

1.  The  formulation  of  a  decision- directed  empirical  Bayes  detection  strategy  for  adap¬ 
tively  tuning  the  decision-rule  to  match  the  observed  characteristics  of  the  signal 
environment.  This  decision  rule  results  in  a  generalized  likelihood  ratio  test  wherein 
the  a  prion  distribution  is  modeled  as  a  finite-state  Markov  chain  that  is  estimated 
or  tracked  as  a  function  of  past  decisions. 

2.  The  develpoment  of  a  new  algorithm  to  perform  spectral  probability  estimation 
via  a  bank  of  scalar  decision- directed  empirical  Bayes  detectors  and  estimators.  This 
algorithm  has  application  in  a  signal  acquisition  and  analysis  scenario  when  the 
probability  of  signal  presence  at  certain  frequencies  is  a  critical  surveillance  factor. 
In  many  sonar  applications,  the  spectral  content  of  a  signal  is  the  prime  characterist  ic 
employed  to  discriminate  between  threat  sources  and  friendly  sources. 

3.  The  joint  estimation  of  signal  distribution  parameters  simultaneously  with  the  estima¬ 
tion  of  the  prior  distribution.  The  empirical  Bayes  approach  requires  the  estimation 


of  the  prior  distribution,  but,  classically,  assumes  that  the  conditional  distributions 


£ 


are  known.  This  analysis  provides  an  algorithm  for  estimating  the  signal  magnitude 
and  noise  variance  as  well. 

Conclusions 

The  conclusions  of  this  study  are 

1.  Decision-directed  empirical  Bayes  procedures  in  spectral  probability  estimation  have 
been  shown  to  be  useful  in  establishing  the  probability  of  signal  presence  at  given 
discrete  frequencies.  Using  simulated  data,  a  number  of  test  scenarios  have  been 
conducted  and  the  detector  performance  has  been  evaluated.  These  simulations 
have  shown  that  effective  estimates  of  the  signal  probability  spectrum  are  obtained 
at  various  signal-to-noise  ratios.  These  results  have  been  presented  as  probability 
surfaces  plotted  against  time  and  frequency.  The  problem  of  runaway  is  demonstrated 
when  signal  parameters  are  also  required  to  be  estimated,  but  is  negligible  for  the 
case  when  the  prior  probability  only  is  estimated.  Averaged  probability  estimates 
obtained  through  Monte  Carlo  analysis  demonstrate  high  confidence  in  the  resulting 
probability  information. 

2.  The  decision-directed  rule  effectively  tracks  time- varying  prior  distributions.  A  num¬ 
ber  of  time-varying  prior  distributions  have  been  simulated  and  it  has  been  shown 
that  these  rates  may  be  effectively  tracked  with  a  decision-directed  approach.  The 
most  obvious  feature  of  these  estimated  rates  is  a  time-lag  which  is  a  feature  typical 
of  real-time  processing.  The  estimator  is  causal  and,  consequently,  a  few  samples 
are  required  to  lock  on  to  the  new  rate.  This  lagging  phenomenon  cannot  be  elim¬ 
inated  with  real-time  processing;  only  noncausal  processing  involving  a  smoothing 
algorithm  is  capable  of  removing  such  phenomena. 

3.  The  performance  of  the  decision-directed  empirical  Bayes  detector  is  compared  to  the 
standard  Bayes  case,  where  the  prior  is  exactly  known.  It  is  shown  that  as  the  signal 
to-noise  ratio  increases,  the  performance  as  measured  by  the  total  probability  of  error 
for  the  empirical  Bayes  approach  approaches  that  of  t lie  standard  case.  A  striking 
aspect  of  the  simulations  studied  is  that  the  additional  complication  of  estimating 
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Chapter  2 

Technical  Approach 


2.1  Decision  Theory  Background 

In  order  to  estimate  the  a  priori  probabilities  associated  with  the  various  frequency  com¬ 
ponents  of  a  given  signal,  we  assume  the  following  model:  Let  y(t)  be  a  received  signal 
composed  of  possibly  harmonically  related  sinusoids  contaminated  with  additive  white 
Gaussian  noise.  Thus  we  can  represent  y(t)  as  summation  of  sinusoids  and  noise  as  fol- 


y(t)  =  ^a,(t)A,{t)  cos(u>,t  -f  <Pi)  +  v(t).  (2.1) 

i=i 

Where  is  the  frequency  of  signal  i.  0,  is  the  phase  angle  for  signal  i,  uniformly  distributed 
from  0  to  27T.  A,(t)  is  the  time- varying  amplitude  of  signal  i.  and  v{t)  is  additive  Gaussian 
white  noise.  The  number  of  sinusoids  present  is  denoted  by  P.  where  P  has  no  upper 
bound.  The  a,(  t)  are  discrete-time  point  processes  (DTPP’s)  describing  the  signal  presence 
at  time  t.  In  estimating  the  a  prion  probability  of  the  signal  presence,  we  are  then  actually 
estimating  the  rate  (i.e.  the  expected  value)  of  the  DTPP  governing  the  signal  presence 
at  a  given  frequency.  This  model  is  therefore  comprehensive  enough  to  admit  all  forms 
of  compositions  of  sinusiods  at  given  frequencies  when  there  is  noise  present.  This  model 
could  also  be  used  to  investigate  other  phenomena  such  as  frequency-hop  spread-spectrum 
communication  systems  and  frequency-multiplexed  communication  channels. 

More  generally,  we  can  describe  the  staff'  of  nature  as  the  set  of  hypotheses  defined  by: 

H  .  o  „  :  S,’1  ns-n...ns,y 

where  Sj  is  the  event  that  a  Mgnal  is  present  at  at  frequency  The  a.  are  binarv- valued 


indices  representative  of  the  logical  operation  denoting  signal  presence  at  u:}  defined  as 


follows: 


Cal  — 
- 


Sj  if  a:  =  1 


l  Sj  if  ctj  =  0 

where  Sj  is  the  logical  complement  of  Sj.  Let  the  intersection  of  all  the  S°’  be  denoted  a 
superclass  set  and  let  it  be  written  as 


S?1  n  S“ 2  n  •  •  •  n  Spp  =  sa'  "ap. 


Thus,  the  model  given  in  Equation  2.1  yields  observations  drawn  from  the  set  of  2P  mu¬ 
tually  exclusive  superclass  variables.  Thus,  the  observations  y(t)  may  belong  to  any  of  the 
superclasses  at  a  given  time  t,  and  furthermore,  the  superclass  assignment  characterizing 
the  observations  may  vary  with  time. 

The  problem  scenario  may  be  viewed  in  the  classical  framework  of  detection  theory. 
We  will  assume  that  there  are  P  sinusoids  present  and  that  there  is  little  or  no  reliable  a 
priori  information  concerning  the  probability  distribution  of  the  signal  classification.  We 
will  not  assume,  however,  that  the  signal  classifications  are  independent. 

In  order  to  make  subsequent  discussion  more  lucid,  at  this  point  we  introduce  notational 
conventions  to  be  employed  in  the  remainder  of  the  report.  Since  the  problem  is  concerned 
with  estimation  of  frequency  content  we  use  a  discrete  Fourier  (DFT)  transform  to  obtain 
frequency  contem  information  about  the  time-domain  observations  y(f).  Justification  of 
this  approach  is  given  later.  Let  us  denote  the  transformed  observations  in  a  given  DFT 
bin  as  Yk(tf)  where  k  is  the  bin  index.  Due  to  the  fact  that  the  DFT  information  is 
given  not  only  for  discrete  time  t.  but  for  block-quantized  time,  we  use  the  £  subscript  to 
indicate  the  block  index  at  which  analysis  is  being  performed.  Therefore,  we  shall  refer 
to  the  observations  from  now  on  as  Yk(tf)  being  the  data  from  the  k th  DFT  bin  taken  at 
tune  if.  These  observations  can  be  expressed  in  vector  form  as 

We  proceed  with  an  empirical  Bayes  approach,  and  estimate  sequentially  the  a  prion 
distribution  of  the  signal  for  use  in  an  A/ -ary  decision  problem.  We  follow  the  philosophy 
espoused  by  [22]  that  there  exists  an  unknown  a  priori  distribution  on  the  signal  structure, 
and  this  distribution  may  be  discovered  by  processing  the  observations.  Y(tf),  over  time. 
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The  decision  function,  D(Y),  then,  is  adaptive,  in  the  sense  that  £)(•)  will  depend  on  past 
observations,  i.e., 

£>[Y(M]  =  £>[Y(s),  s  <  t( ;  Y(f/)]  (2.4) 

such  that  action  £>[Y(f<)]  is  a  function  of  all  past  observations,  Y(s),  s  <  t(.  This  approach 
may  be  altered  somewhat  by  rendering  the  decision  function  as  a  function  of  past  decisions 
(which  is  a  function  of  past  observations)  rather  than  as  a  function  of  past  observations 
directly.  Such  a  decision-directed  rule  has  the  form 


D[Y(U)\  =  D{D[Y(s)],s  < 


(2.5) 


I 


Decision  rules  of  the  form  (2.4)  may  be  termed  “feed-forward”  decision  rules,  with  infor¬ 
mation  flow  as  depicted  in  Figure  2.1,  since  the  past  data  are  fed  into  a  decision  function 
generator  which  in  turn  modifies  the  decision  rule.  Decision  rules  of  the  form  (2.5)  may  be 


Figure  2.1:  Feed-Forward  Rules 


termed  “feedback”  decision  rules,  with  information  flow  as  depicted  in  Figure  2.2.  since  the 
past  decisions  are  fed  back  into  a  decision  function  generator  which  modifies  the  decision 
rule.  Clearly,  a  more  general  decision  rule  may  be  formulated  which  employs  both  feedback 
and  feed-forward  information  flows,  which  represents  an  obvious  generalization  of  Figures 
2.1  and  2.2. 

The  following  discussion  is  a  general  representation  and  uses  the  time-domain  obser¬ 
vations  y(t),  but  similar  constructions  exist  for  transformed  signals  of  any  type.  Adaptive 
decision  rules  of  the  types  described  above  may  be  used  to  improve  performance  over  non- 
adaptive  procedures,  since  they  may  be  used  to  estimate  the  a  •priori  distribution.  To 


Figure  2.2:  Feedback,  or  Decision-Directed,  Rules 


illustrate,  consider  the  binary  hypothesis  problem 

H0  :  y(t)  =  n{t) 

Hx  :  y(t)  =  s{t)  n{t) 

where  s(t)  is  known  and  n(t )  is  of  known  distribution.  If  the  a  prion  distribution  of  the 
occurrence  of  s{t)  were  known,  the  optimal  decision  rule  (in  the  sense  of  minimizing  the 
probability  of  error)  would  be  of  the  form 


*%(*)]  = 


1  if7r/(y(t)|ff,)>( 

0  otherwise 


-  w)f(y(t)\H0) 


where  tt  is  the  a  prion  probability  of  signal  occurrence  and  f{y(t)\H-l)  and  f(y(t)\H{)) 
are  the  probability  distributions  of  y(t)  under  hypotheses  H i  and  H0.  respectively.  The 
problem  we  face  is  that  n  is  not  known.  The  danger  in  arbitrarily  guessing  the  value  of  t 
is  well  known,  but  is  illustrated  here  for  completeness.  Let  R(D,  7r)  denote  the  Bayes  risk 
function  under  a  distribution  for  tt.  The  Bayes  decision  rule  will  be  one  that  minimizes 
this  function.  Let  DK(-)  denote  such  a  rule.  The  Bayes  envelope  function,  r(-)  =  R(  D -.  rr  i 
represents  the  minimum  risk  envelope  when  the  decision  rule  is  specified  with  the  correct 
value  of  the  a  prion  distribution.  For  the  binary  decision  rule  in  this  example,  the  Bayes- 
envelope  function  is  displayed  in  Figure  2.3.  Now  it  can  be  seen  what  happens  when  the 
prior  is  in  error.  Suppose  the  true  a  prion  distribution  is  t r*  and  the  assumed  n  prum 
distribution  is  it'  ^  7r*.  Clearly,  D T<  can  lead  to  excessively  large  risks,  and  it  would  perhaps 
be  prudent  to  employ  a  minimax  rule  (denoted  by  7r V/  in  the  figure)  which  bounds  the  risk 
for  all  values  of  n .  Robbins  [22]  showed,  in  a  classical  result,  that  empirical  Bayes  rules 
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tr'  *\i  7T’  1  tt 

Figure  2.3:  Ba\  s  Envelope  Function 


The  use  of  feedback  decision  rules  for  estimation  of  the  prior  is  perhaps  first  treated 
analytically  1  by  Davisson  and  Schwartz  [6],  wherein  decision  feedback  algorithms  are  pro¬ 
posed  and  runaway  (a  divergence  phenomenon  which  may  occur  if  a  sequence  of  detection 
errors  cause  the  estimate  of  the  prior  to  converge  to  zero  or  unity,  thereby  causing  the 
decision  rule  to  go  unstable)  probabilities  are  bounded.  The  resulting  decision  rule  is  of 
the  form 


D[D(s),s  <  t\  y(f)]  = 


1  if  Mt)f(y(t)\Hx)  >  (1  -  fc{t))f(y(t)\H0) 
0  otherwise 


where  if (t)  is  an  estimate  of  7t  given  { D ,  <t). 

Although  a  complete  discussion  of  feedback  and  feed-forward  decision  rules  will  not  be 

attempted,  it  may  be  instructive  to  comment  briefly  on  some  of  the  differences. 

'Earlier  researchers  [20,28,17]  successfully  used  decision-directed  detectors,  hut  described  performance 
empirically. 
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1.  The  feed-forward  decision  rules  are  asymptotically  sub-minimax.  It  can  be  shown 
that,  under  appropriate  technical  conditions  [24],  the  risk  using  an  unbiased  estimator 
of  ~  will  converge,  asymptotically,  to  the  Bayes  envelope. 

Xo  such  global  results  are  available  for  the  feedback  decision  rules,  due  to  the  positive 
probability  of  runaway.  It  can  be  shown,  however,  that  the  probability  of  runaway 
can  be  bounded,  and  these  bounds  are  double  exponentially  tight  [6].  Consequently, 
barring  runaway,  the  risk  using  an  unbiased  feedback  estimator  of  n  will  converge  to 
the  Bayes  envelope. 

2.  Classical,  or  feed-forward  empirical  Bayes  rules  are  quite  complex,  whereas  decision- 
directed  rules  are  extremely  simple.  Consequently,  they  are  more  attractive  for  use 
and  are  more  easily  implemented. 

3.  The  feed-forward  rules  are  based  on  the  assumption  that  the  a  priori  distribution 
is  stationary  (time-invariant).  Indeed,  stationaritv  is  the  vert'  basis  of  the  classical 
empirical  Bayes  approach.  As  noted  by  [6],  many  applications  where  the  prior  is 
unknown  are  highly  likely  to  be  nonstationary.  and  it  will  be  necessary'  to  "track 
the  nonstationary  prior.  In  such  situations,  the  feed-forward  rules  may  be  intractable, 
and  the  use  of  feedback  rules  may  be  the  only  viable  approach. 

The  objective  of  this  investigation  is  to  estimate  the  multivariate  probability  distribu¬ 
tion  function  of  the  signal  classification,  i.e..  the  probability,  at  each  time  t.  that,  the  signal 
is  in  any  of  the  possible  classification  states.  We  will  assume  that  the  distribution  func¬ 
tion  may  be  time- varying  and,  therefore,  we  are  required  to  obtain  equations  of  evolution 
for  this  multivariate  distribution.  The  observations  that  are  available  for  estimating  this 
distribution  are  the  outputs  of  the  sensors.  We  shall  employ  a  feedback  approach,  and 
estimate  the  distribution  based  upon  the  past  decisions,  thereby  developing  a  rule  of  the 
form  expressed  by  (2.5).  wherein  the  multivariate  probability  distribution  function  of  the 
detection  events  is  estimated  and  used  to  formulate  the  empirical  Bayes  decision  rule. 


-r. 


2.2  Frequency  Distribution  Estimation 


Initially,  in  order  to  estimate  the  probability  distribution  of  the  frequencies  present  in  the 
signal  y(t),  we  must  transform  the  time-domain  data  to  the  frequency  domain  or  do  some 
other  form  of  spectral  analysis  to  establish  the  energy  content  of  the  signal  at  a  given 
time.  We  will  not  attempt  a  complete  exposition  of  spectral  estimation  methods  in  this 
report,  but  for  the  sake  of  completeness,  we  illustrate  the  problem  by  giving  the  following 
examples: 

•  Short-time  Fourier  analysis  [11]  is  performed  by  computing  a  running  average  of  dis¬ 
crete  Fourier  transforms  of  the  input  signal.  This  method  is  useful  because  of  the 
computational  advantages  involved  in  using  the  Fast  Fourier  Transform  (FFT)  algo¬ 
rithm  to  compute  the  transformed  data  and  the  simplicity  of  the  averaging  process. 
An  implicit  assumption  made  when  employing  Fourier  analysis  is  that  the  signal  is 
inherently  stationary  -  an  assumption  which  cannot  be  completely  valid  for  any  phys¬ 
ically  realizable  situation,  but  is,  in  fact,  approximately  satisfied  for  many  signals, 
especially  during  short  time  intervals. 

•  The  Wigner  distribution  [5,4,3,9,19]  has  recently  gained  much  popularity  as  a  tool 
in  spectral  analysis  due  to  the  fact  that  it  includes  the  time  parameter  in  it's  for¬ 
mulation.  It  involves  the  computation  of  signal  energy  at  discrete  frequencies  and 
therefore  is  limited  in  resolution,  but  the  frequency  resolution  is  not  limited  to  the 
number  of  time  samples  as  is  the  case  with  the  discrete  Fourier  methods.  The  FFT 
algorithm  can  be  used  to  speed  up  computation  of  the  Wigner  distribution,  but  the 
computational  burden  is  still  greater  than  the  short-time  Fourier  analysis  methods. 

•  A  plethora  of  so-called  high-resolution  estimation  techniques  exist  which  are  not  lim¬ 
ited  to  the  resolution  of  energy  at  discrete  frequencies,  but  are  able  to  resolve  energy 
in  a  continuous  spectrum.  Although  many  variations  exist,  these  methods  largely  rely 
on  an  eigenvalue-eigenvector  decomposition  of  the  correlation  or  covariance  matrix 
generated  from  the  signal  data.  Some  of  these  methods  are  more  computationally 
burdensome  than  others,  but.  as  a  class,  they  do  involve  a  greater  computational  cost 


than  either  of  the  above  two  methods.  Some  of  the  more  well  known  high-resolution 
methods  are  the  MUSIC  algorithm  [25],  a  variety  of  modifications  of  this  approach 
[2],  and  ARM  A  (auto- regressive,  moving  average)  statistical  signal  models  [23], 

For  our  purposes,  we  choose  the  discrete  Fourier  transform  method  as  an  appropriate 
procedure  to  examine  the  spectral  energy  present  during  a  given  time  interval  which  we 
may  designate  as  a  block  of  data  samples.  During  the  time  block  then,  we  transform  the 
input  time  data  sequence  y(t)  into  a  frequency  data  sequence  designated  as  Yk(tf).  We 
can  alternatively  represent  this  sequence  as  a  frequency  data  vector  Y (tf)  for  the  block  at 
time  t(. 

Let  y  £  Cp  denote  the  signal  space,  where  C  is  the  complex  plane.  And  let  Y (tf)  6  y 

denote  the  observed  frequency  data  at  time  block  tf ,  for  t(  —  0,1 .  Let  5  denote  a 

classification  set  for  the  signal  Y.  and  define  a  decision  function 

d  .y  {0.  1) 


as  a  binary- valued  function  mapping  the  observed  signal  into  a  abstract  classification  space. 


That  is. 


D[Y(t()}  = 


1  S  occurs  at  time  t / 

0  S  does  not  occur  at  time 


For  the  P  classes  Sj.j  =  1 . P  given  above,  we  may  define 


^»-{J 


1  Sj  occurs  at  time  t, 

0  5,  does  not  occur  at  time  tf 


and  consider  the  P-vector  decision  function 


D(  ■)  =  [£>,(■)•  DP(-)]7 


We  note  that  the  classification  sets  S7  need  not  be  disjoint  (i.e..  Y(t,)  may  belong  to  any 
or  all  of  the  classes).  We  will  assume  that  Y (tf)  must  be  classified  into  at  least  one  of  the 
Sj.  Alternatively,  using  the  superclass  notation  given  above,  the  decision  problem  reduces 
to  selecting  the  one  superclass  possessing  the  proper  attributes. 

At  each  time,  tf ,  we  are  confronted  with  an  M- ary  decision  problem  involving  the 
hypotheses 

H„,  ..ap  or  op€  {0.  J}p 
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where  Sai"'ap  is  defined  in  Equation  2.3. 

The  Bayesian  approach  to  this  problem  is  to  choose  the  hypothesis  H ;  for  which  the 
likelihood  function 

Tra'-ap(t()f(Y(t()\Sa'-°p),  (2.7) 

is  maximized,  where  f(Y(t()\Sai  "ap )  is  the  distribution  of  Y (te)  under  classification  <Sai '"ap 
at  time  t(,  and 

na'"ap(tt)  =  P{Sa^0,p\t(}  (2.8) 

is  the  a  ■prion  probability  mass  function  for  each  of  the  events  Sai  'ap  at  time  t(.  The 
decision-directed  empirical  Bayes  approach  is  to  estimate  TTa'"'ap(ti)  by  means  of  the  past 
decisions,  {D(s),s  <  tf}. 

Ideally,  we  would  wish  to  estimate  n a'"'ap(t()  directly  in  order  to  exploit  any  infor¬ 
mation  contained  in  the  harmonic  relationships  which  are  to  be  expected  in  the  analysis 
of  complex  waveforms.  However,  as  an  initial  simplification  of  the  problem,  we  address 
the  case  where  it  is  assumed  that  we  can  factor  tt a'"'ap(t()  into  independent  marginal 
probabilities  as  follows: 

7 Ta'  -ap{te)  =  nai  tt"2  •  •  •  irap . 

We  later  address  methods  which  treat  the  more  general  case  involving  interdependence 
between  the  various  frequency  components.  Thus,  these  marginal  probabilities  for  the 
various  signal  classes  are  the  probabilities  to  be  estimated.  This  model  gives  rise  to  a  set 
of  scalar  estimators  each  of  which  provides  estimates  for  a  given  marginal  probability,  and 
is  an  attractive  approach  to  the  problem  since  the  dimensionality  of  the  problem  remains 
relatively  low  compared  with  the  high  dimensionality  one  would  encounter  when  estimating 
even  a  portion  of  the  joint  probability  structure.  The  fact  that  the  noise  in  each  frequency 
bin  is  uncorrelated  with  other  bins  also  lends  support  to  this  approach  (see  Appendix 
B.3).  However,  a  significant  amount  of  information  about  inter-signal  correlation,  if  there 
is  any,  is  automatically  lost  in  this  process.  Alternatives  to  be  explored  in  mitigating  this 
problem  are  a  joint  probability  conditional  factorization  similar  to  that  done  by  Stirling 
and  Swindlehurst  [34],  and  a  new  method  using  classification  decision  feedback  initiated 
for  this  research,  but  not  yet  fully  developed  as  a  theoretically  robust  strategy. 
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2.2.1  Probability  Models 

As  a  practical  matter,  even  though  the  signal  model  admits  P  different  sinusoids  in  the 
received  signal,  when  using  the  discrete  Fourier  transform  (DFT)  as  our  spectral  analysis 
tool,  the  frequency  resolution  is  dependent  on  the  number  of  time  points  taken  in  the 
analysis  window  of  y(t).  Let  the  number  of  time  points  in  the  analysis  window  be  denoted 
by  M.  Due  to  the  periodicity  assumption  inherent  in  the  frequency  domain,  only  y  +  1 
unique  frequencies  are  represented  by  the  elements  of  Y(f^).  Therefore,  the  dimensionality 
of  the  problem  is  reduced  if  y  +  1  <  P  or  increased  if  y  +  1  >  P.  Since  we  admit  that  P 
1  may,  in  fact,  be  a  very  large  number  the  chance  of  dimensionality  reduction  is  actually  quite 

small.  In  any  case  however,  it  is  not  guaranteed  that  the  P  different  sinusoids  are  such  that 
they  fall  into  distinct  bins  of  the  DFT.  This  problem  can  be  alleviated  by  increasing  the 
>  size  of  the  analysis  window,  but  the  sinusoids  still  cannot  be  ensured  to  be  in  completely 

separate  bins.  Thus,  we  consider  that  energy  present  in  a  given  DFT  bin  is  due  to  a  single 
sinusoid  at  a  frequency  contained  in  the  frequency  range  of  that  bin. 

^  Now,  define  Nk(tt)  to  be  the  DTPP  generated  by  the  decision  process  operating  on  the 

kth  component  of  the  observation  vector  at  time  t(.  In  other  words,  denote 

Nk(tt)  =  Dk[Yk(te)\,  k  =  0, . . . ,  ^  t(  =  £M/2.  t  =  0. 1, . . .  (2.9) 

where  the  usage  of  y  is  explained  above.  With  this  understood,  and  for  the  sake  of  simpler 
notation,  we  now  designate  L  =  y .  Let  denote  a  a-field  generated  by  all  of  the  factors 
that  may  affect  the  distribution  of  the  process  N(t()  at  time  t(,  and  define  the  marginal 
*  conditional  probability  mass  function 

■  Pyk(tt)(ak\Btt_l )  =  P{Xk(t()  = 

i  That  is,  we  say  that  PN,<t()(Qk\Bt,_x)  is  the  conditional  probability  of  signal  energy  being 

present  in  the  fcth  DFT  bin. 

Now,  we  can  represent  the  above  a  prion  probability  as  the  rate  of  the  DTPP  ,\\.(  t, ) 
as  follows: 

Ak(M  =  P{.V*(M  =  }  k  =  0 . L  .  (2.10) 
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At  this  point,  it  must  be  emphasized  that  the  rates  \k{tt)  do  not  correspond  directly  with 
the  TTai"'ap  given  above  unless  all  sinusoids  are  assigned  to  separate  DFT  bins.  Therefore, 
we  may  not  completely  describe  any  a  •priori  probability  distribution  due  to  the  resolution 
of  the  spectral  analysis  tools  employed.  Since  we  desire  to  estimate  the  probability  struc¬ 
ture  via  these  marginal  probabilities  or  rates,  we  conclude  that  we  must  avail  ourselves  of 
an  estimator  for 

To  do  this,  we  require  a  probabilistic  model  to  characterize  this  rate  and  a  model  to 
describe  its  evolution  in  time.  A  physically  meaningful  and,  at  the  same  time,  mathemat¬ 
ically  tractable,  model  for  A k(t()  is  to  represent  it  as  a  finite-state  Markov  chain.  Such  a 
model  may  be  contrasted  with  a  continuous  model  for  as  follows: 

1.  The  Markov  structure  permits  the  evolution  of  A k(te)  to  be  treated  probabilisti¬ 
cally  via  the  state  transition  matrix.  This  representation  may  be  contrasted  with  a 
stochastic  differential  or  difference  equation  for  A*.(^)  which  may  be  difficult  to  treat 
analytically.  The  introduction  of  a  Markov  model  permits  the  application  of  an  exact 
MMSE  estimator  for  Xk(U)  with  a  recursive  estimator. 

2.  The  finite-state  model  permits  limits  on  the  range  of  A k(t()  to  be  imposed,  and 
the  rate  may  be  restricted  to  the  expected  domain  of  the  parameter  space.  Such 
a  limitation  may,  for  example,  be  chosen  to  reduce  or  eliminate  the  probability  of 
runaway,  which  is  a  possibility  in  the  decision-directed  estimation  context. 


Under  the  Markov  structure,  we  may  represent  A k(t()  els  a  finite-state  vector  Markov  chain 
with  states  . . .  ,  pr(M  which  can  be  expressed  in  vector  form  for  each  bin  k  as 


Pk(tf)  = 


Pk,2(t() 


L  pkAU) 

Define  a  Markov  state  vector  for  each  bin  as  xk(t()  =  . .  .  j-r(^)]T  where 


tj.  \  _  f  1  if  A k{te)  =  pk',(t.()  . 

e  1  0  otherwise 


r. 


(2.11 : 


•.  <*, 

•.v. 


Thus,  we  can  represent  \k(t()  in  inner  product  form  as 

Afc  {tf)  =  pT{t()xk{t.e).  (2.12) 

The  state,  therefore,  characterizes  the  probability  distribution  of  the  process  .V k[tf).  In 
other  words,  knowledge  of  xk(te)  specifies  the  probability  mass  function  p  vfc< f#)(  ak ). 

The  evolution  of  the  state  may  be  characterized  by  means  of  a  state  probability  tran¬ 
sition  matrix 

qtJ(te)  =  P{xk,j(te+i)  =  l\xkJtt)  =  1}  i.J  =  1 . r  '2.13. 

Let  Q (t()  =  [qtJ(t()}  denote  the  state  transition  matrix.  Then  the  state  evolves  as 

xk{t(+ 1)  =  Q T{tf)xk(t()  +  u k(tf)  '  2.14  ' 

where  ufc(M  =  xk(te+l)  —  QT(tt)xk(t().  Define  the  family  of  a -fields 

&k,te  =  a  {iVfc(s  ).  s  ^  tf-  Xfc(.s),S  <  L+1}; 

we  observe  that  u k(te)  £  BkAl  and  E(uk( )  =  0.  Consequently.  {  u* }  is  a  martingale 
difference  (MD)  process  with  respect  to  the  family  of  cr-fields  (£>*..}  (see  Appendix  A'. 
Notationally,  we  say  {u/t}  is  a  {5t}-MD. 

Since  \k  £  [0,  1]  we  must  have  the  elements  of  each  pk  vector  above,  bounded  by  unity. 

i.e.. 


77* 


.  -  i 


;j| 


f.4 


Si 


SI 


% 


-■n 


0  <  pk ,  <1.  k  —  0 . L.  j  =  1. ...  r. 

2.2.2  Estimation  Procedure 

•  A  useful  characterization  of  the  process  {.Vfc}  is  to  obtain  its  Doob  decomposition  with 

respect  to  { } .  Recall  (see  Appendix  A)  that  the  Doob  decomposition  of  a  process  {.\\. } 
with  respect  to  a  family  of  a-fields  {Bk}  is  the  representation 

®  yk(t()  —  A  k(tf)  +  wk{tf) 

where  A k(tf)  is  a  -predictable  process  (i.e.,  A k(tf)  £  Bk, f  i  for  all  t,)  and  u'k(t,)  is 
a  (G/j}-\ID  sequence  (i.e..  ?rjt(M  €  Bk.t,  and  E(u'k{tf)\Bk,t-))  —  0).  From  the  above 
^  development 

Ait( //)  =  E(7J  k(tf)\Bk  t(  l ). 
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and  if  we  define 

u~k(  tf)  =  .V,  {ft)  --  E{  A  tf  )|5fc.tf_,  ) 

then 

JVfc(ff)  =  Afc(/<)  4-  u-fc(#/)  =  p\{  tf)Xk(  t,-)  ■+■  u'kitf)  (2.1  o' 

is  the  (unique)  Doob  decomposition  of  {*Yk}  with  respect  to  ( } 

Equations  (2.14)  and  (2.15)  represent  a  type  of  state-space  model  for  the  system  under 
study.  The  dynamics  equation,  (2.14)  describes  the  evolution  of  the  process  xv <  / ,  *  over 
time,  and  is  analogous  to  a  linear  difference  equation  driven  by  a  noise  process.  The 
observation  equation  (2.15)  provides  the  relationship  of  the  observed  process  ,\*i  t,  >  to  the 
state,  and  is  analogous  to  the  signal-in-additive-noise  process  familiar  to  linear  estimation 
problems.  Although  these  equations  are  similar  to  their  counteqmrts  in  linear  system 
theory,  they  cannot  be  treated  the  same  way.  since  the  processes  {u* }  and  { >rk  \  are  not 
additive  white  noise  processes. 

If  Xfc (tf)  were  known,  the  problem  of  predicting  Xk(t/)  at  any  time  tf  would  be  solved, 
regardless  of  the  past  history  of  ,Vk(  ).  Unfortunately.  xk(M  is  not  directly  observable: 
only  Xk(te)  is  observed.  We  are  thus  faced  with  the  problem  of  estimating  Xk(t,:)  in  order 
to  render  an  acceptable  prediction  of  .\\(t/h  To  formulate  this  estimation  problem  more 
clearly,  let  us  define  the  family  of  (7-fields  Tk.t,  a-s 

Ek.t,  =  <7{-Vit(s  ).  s  <  tf]  (2.1G) 

and  compute  the  conditional  expectation  of  Xk(te)  given  Tk,t,_x-  To  do  this,  we  draw 
upon  two  fundamental  results  of  martingale  theory,  namely,  Tie  innovations  theorem  and 
the  representation  theorem.  These  theorems  are  stated  and  discussed  in  Appendix  A. 
Application  of  these  theorems  results  in  a  nonlinear  estimation  procedure  to  obtain  the 
Doob  decomposition  of  {.At}  with  respect  to  {T^}.  yielding 


Xk(tf)  =  Afc(TlT-i)  +  i'k(t-f)  =  p{{t,)Xk(t,'t,- i  1  -  i vM-  '2.17 

where  {i^.}  is  a  {.Tfc}-MD  process  and  ’s  the  conditional  expectation  of  Xq  t,  > 

given 
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The  process  x*(M  modulates  the  rate  of  the  discrete-time  process  Xk{t,)  according  to 
equations  (2.14)  and  (2.15).  We  wish  to  obtain  equations  of  evolution  of  the  process 

xt(f/+i|M  =  E{xk(tf+x)\Xk,tt}  =  E*k-'‘*k(tt+i) 

the  conditional  expectation  of  x(tf+ 1)  given  the  a  -field  Ek,t,  (in  standard  and  an  alternate 
notation).  We  follow  the  results  of  [29],  and  obtain  an  estimator  of  the  form 

x*(t/+i|M  =  E:Fk,'-'xk(t,+  x)  +  (Mt,  Vk)tt(i/k*i'k)7tll/k(t()  (2.18) 

where 

pk(te)  =  ETk‘'xk(t,+x)  -  E^-xfc(#,+  1)  (2.19) 

and 

vk(t<)  =  Xk(te)  -  E^'-'X^t,)  =  .Vfc(M  -  p[(f, )£*■•'-' xt(M.  (2.201 

the  matrix 

(fik,  vk)t(  =  E*1"1-'  nk(tt)vk(tf)  (2  21 ) 

is  the  conditional  covariance  of  pk(tf)  and  vk(tf).  and  the  quantity 

(t/fc.i/fc),,  -  '2.22) 

is  the  conditional  variance  of  vk(tf). 

The  conditional  covariance  (fik.  vk)t,  is  derived  in  Appendix  A.  and  is  the  r  x  1  matrix 

\nk{t,)uk(tf)\  =  Qr(t,)diag  {xfc(<,|f,_,)}p(  t()  -  Q1  ( tf)xk(  t,\t,_x  )xl(tf\tf_l)p{  t, ) 

=  Qr(  Mfdiag  {xfc(f,|#,_,)}  -  xt(f,|f,_,)x[(Mf,_,)]p(M 

(2.23 . 

where  diag  {  }  denotes  a  diagonal  matrix  whose  diagonal  elements  are  composed  of  the 
elements  of  the  vector  argument. 

The  conditional  variance  ( n.  is)tf  is  also  derived  in  Appendix  A.  and  is  the  quantity 

EF"'->\vl(t,)\  =  \K{tf\t(.x)  -  A *(f, *2  2  4 

with 

Afc(ME.i)  =  pl(tf)xk(ft\tr-i).  1  2.25  1 
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Thus,  the  estimator  becomes,  using  (2.23)  and  (2.24). 

=  QT(  #/)**(  M*/-i  )  +  \nk(t/)  v{  f/)]  ■'->  [vk(  t,-)2]j  i'(t, 

The  resulting  rate  estimates  are.  therefore. 

^k(t(+\ |M  =  pTk{U+ 1  )xk(tt+i\t,). 


2.26 
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2.2.3  Covariance  of  Estimation  Error 


The  recursive  estimator  given  by  (2.26)  and  the  corresponding  rate  estimate  given  by  (  2.27) 
represent  the  minimum-mean  square  error  prediction  of  xk{t(+i)  and  \k(tf+\)  given  Ek,tr 
the  past  and  present  data.  We  may  obtain  the  conditional  uncertainty  on  these  estimates 
by  computing  the  conditional  covariance  matrix  of  the  estimation  error 

X/tUf+ilM  =  x*(</+i)  -  xk(t(+l\t() 


i  2.2S) 


Afc(  <<r+ilM  —  ^k(0+ 1 )  —  Afc(  tf+i  \t( ) 

The  estimation  error  covariance  on  xk(tf+x\t,)  may  be  computed  as 

Pk{t,+  l\tf)  =  E*k- 

=  Et>  •fxk(tt+i)xj'(tt+ 1)  -  Xfc(^+i|^)Xfc(/#+i|«/) 

=  E  *-'<diagXfc(^+1)  -  x*(<m.iIMx[(#,+  i|M 
=  d:.  gxk{t(+l\tf)  -  *k(tf+i\t,)xl(t,+  x\t() 
and.  by  (2.27).  we  have  the  covariance  of  the  rate  estimation  error  given  by 
nk(t,+  l\t<)  =  ETk  ,r\k(t,+  l\t()\l{tf+l\tl) 

=  Pk(t(+  i)  [diag  xk(t(+l\tf)  -  xk(t,+  x\tf)xl(  tf+l  |ri)j  p(  t,+ ,  ) 

At  this  point,  some  comments  with  regard  to  the  recursive  estimator  provided  in  Equation 
(2.26)  may  be  appropriate.  Note  that  although  this  estimate  is  recursive  and  possesses 
structure  much  like  a  Kalman  filter,  it  is  not  a  linear  estimator  since  the  gain  matrix 
is  dependent  upon  the  state  and.  hence,  upon  the  data.  Furthermore,  the  covariance 
associated  with  this  estimate  is  a  conditional  covariance,  rathe:  than  an  unconditional 
covariance  as  with  the  linear  case.  Note  also  that  this  covariance  does  nor  obey  a  Riceati 
equation,  but  it  is  true  that  the  (state-dependent)  gain  of  the  estimator  is  proportional 
to  this  covariance,  as  is  the  case  with  the  Kalman  filter.  Although  the  estimation  error 


covariance  provided  above  is  conditional,  if  may  properly  be  used  in  practice  to  as>ess  the 
quality  of  the  estimate  for  xk(t,+  i )  or  Afc(  tf+l  ). 
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2.3  Decision-Directed  Detection 


The  DTPP  estimator  defined  in  (2.26)  provides  an  estimate  of  the  vector  thus 

yielding  a  probability  vector  x*(M^-i)  with  components  representing  the  con¬ 

ditional  probability  that  \k(tf)  =  pk,,{t().  The  conditional  expectation  of  A k(t()  given  the 
cr-field  Tk,tt_x  is 

which  yields  the  rate  estimates  for  each  bin  of  the  DFT. 

2.3.1  Detector  Design 

In  each  of  the  DFT  bins,  two  quadrature  values  are  present  and  designated  as  the  real 
and  imaginary  parts  of  the  complex  number  Y'k(tt).  When  using  DFT  data,  it  is  common 
to  employ  detectors  using  either  the  energy  present  in  a  bin  or  the  square- root  of  the 
energy  in  a  bin.  The  former  approach  gives  rise  to  so-called  envelope-squared  detection 
and  the  latter  approach  is  called  envelope  detection.  In  both  instances,  the  original  time 
data  sequences  are  corrupted  by  Gaussian  noise  which  transforms  to  Gaussian  noise  in 
each  frequency  bin  because  of  the  well-known  property  that  sums  of  Gaussian  processes 
are  Gaussian  processes.  Consequently,  the  individual  bin  contents  also  have  the  form  of 
a  signal  plus  additive  white  Gaussian  noise  as  shown  in  Appendix  B.3.  For  this  signal 
and  noise  description  Whalen  [36]  gives  envelope-squared  and  envelope  detectors  using  \J 
and  Rician  noise  models  respectively.  In  actual  use.  for  the  two- hypothesis  case  the  form 
of  the  detectors  is  identical  for  either  approach,  so  only  the  envelope-squared  approach  is 
illustrated  here. 

At  a  given  time  t(  then,  for  each  bin  in  the  DFT.  define  the  two  hypotheses: 

:  Zk(tf)  =  pk 
Hijt(ff)  :  Zk(tf)  =  Jk 

where  Zk\U)  =  |H(M|2  is  the  energy  present  in  a  given  bin,  pk  is  centrally  \2-distributed 
noise  present  in  bin  k.  and  3k  is  non-centrally  \2-distributed  signal  plus  noise  as  given  in 


2  16 


[36].  Thus,  H0jt(^)  represents  the  noise-only  case  and  H ijc(tf)  represents  the  case  of  signal 
plus  noise. 

The  likelihood  ratio  is  expressed  as  a  function  of  the  a  ■prion  probability  of  each  hy¬ 
pothesis  being  correct  as  well  as  the  hypothesis-conditional  probabilities.  The  decisions 
for  the  true  Bayes  case  are  then  are  given  by  the  likelihood  ratio  test  (LRT)  as  follows: 


Dk[Zk(tt)\ 


1  if  \k(tt)p(Zk(tt)\Hi,k(t())  >  (1  -  \k(te))p(Zk(te)\Ho'k{t()) 
0  otherwise 


where  the  conditional  probabilities  p(Zfc(f*)|H0,fc(£/))andp(.£k(t*)|Hijt(t;))  are  represented 
by  central  and  non-central  \ 2  distributions  respectively.  The  empirical  Bayes  philosophy 
is  to  use  estimates  of  the  prior  probabilities  in  place  of  the  actual  probabilities.  If  the 
rate  estimates,  A k(te)  are  used  and  estimates  for  the  parameters  present  in  the  conditional 
probabilities  are  employed,  we  say  that  this  is  a  generalized,  likelihood  ratio  test  (GLRT): 


Dk[Zk(U)]  = 


1  if  \k{tf)p(Zk{t()\liuk(te))  >  (1  -  \k(t())p(Zk(t()\Ho.k{tf.)) 

0  otherwise 


>.30) 


The  complete  derivation  of  the  likelihood  ratio  is  presented  in  Appendix  B.  but  the 
result  is  that  we  can  construct  the  log- likelihood  function  as  a  threshold  Tk  for  each  bin  as 


Tk  =  log  Alt  -  +  \og(Io((qkik)'/2] 


log(  1  -  Afc) 


[2.31) 


where  /0  is  the  modified  Bessel  function  of  the  first  kind,  order  0.  and  7*  is  the  noncentral 
parameter  defined  as  7^  =  2(1  where  (k  is  the  signal  envelope  amplitude  for  bin  k.  The 
quantity  q k  is  the  \2  distributed  observation  defined  as: 


qk  = 


fl 


+  -f 


flk 


for  fi.k  —  Real  Yk(tt)  and  f2.k  =  Imaginary  Thus. 

nk(tf)  = 

Typically,  in  order  to  more  efficiently  compute  the  detection  threshold,  we  would  em¬ 
ploy  small-  and  large-argument  approximations  for  the  modified  Bessel  function  and  use 
these  values  in  the  log-likelihood  ratio  test.  Since  even  the  approximations  desired  are 
in  exponential  form,  the  values  arrived  at  are  usually  just  summations.  In  a  system 


1  if  Tk  >  1 
0  otherwise 


realization  then,  there  is  a  bank  of  L  =  +  1  scalar  detectors  of  the  form  described  above 

which  generate  a  vector  of  binary  detector  decisions  which  we  may  express  as 


n(M  = 


n0{tf) 


L  nL(tt) 


This  is  illustrated  in  Figure  2.4. 


2.3.2  Bias  Correction 

Unfortunately,  it  is  not  generally  true  that  the  A k(te)  is  an  unbiased  estimate,  and  we  must 
investigate  the  effects  of  this  bias  on  the  performance  of  the  detector  and.  if  necessary, 
explore  methods  of  eliminating  or  reducing  this  bias. 

For  any  partitioning  T.  of  the  decision  space,  we  may  express  A in  terms  of  z°'k{ tf  ) 


A ak{t/.)  =  irak(t()  J  [f(Zk(te)\Hakik(te))  -  f(Zk(t,)\Hak'k(t()]  dZk{ tf  ) 
+  /  f(Zk(t()\Hak,k(t())dZk(te) 

JTa>' 

and,  solving  for  rr c"‘(te)  yields 


ya,)-  /  f(zk(t,)\H°k-  (t,))dzk(t,) 

t, )  =  -7 - 7 - - - - - - =  a(T°k)\ak(t,)  +  h(  T°k  j 

jTak  [f(ZM\Ha^\te))  -  f(Zk(U)\Wk'k(te))\  dZk(t, ) 

(2.32) 

where  a(-)  and  b(-)  are  defined  in  an  obvious  way.  Thus,  the  true  rate  rr  is  expressed  as 
a  linear  function  of  the  detected  rate.  A.  In  general,  a{  )  ^  1  and  b(-)  0.  However,  for 

any  given  decision  region  Tak ,  the  correction  terms  may  be  computed  and  applied.  If  we 
estimate  A ak(t?)  using  the  above  scheme,  we  may  then  express  the  estimate  of  -c'k(t,)  as 


irak(t,)  =  ci(Tak)\ak(t,\t,^)  +  b(T a"). 


This  structure  holds  for  all  values  of  Tnk  and.  in  particular,  holds  when  the  partition 
regions  Tnk  are  specified  by  the  previous  best  estimate  of  the  prior,  namely.  tr(U_,  l 


There  are  a  number  of  issues  to  be  considered  concerning  ihe  removal  of  the  bias.  First, 
it  is  evident  from  the  structure  of  (2.32)  that  a(-)  >  1,  and  therefore  the  variance  on  the 
estimation  error  for  7r°k  will  be  greater  than  the  variance  on  the  estimation  error  for  A  'k. 
thus  bias  may  be  removed  only  at  the  expense  of  increased  uncertainty  in  the  estimate 
Second,  the  integrations  indicated  in  (2.32)  are  extremely  complex,  since  the  integrations 
are  taken  over  m-dimensional  space.  For  the  Gaussian  case,  these  integrals  cannot  be 
evaluated  in  closed  form,  and  the  computational  burden  to  numerically  evaluate  them  is 
considerable.  Consequently,  for  the  present  analysis,  we  simply  neglect  the  bias  and  use 
the  simple  estimator  defined  by  (2.25).  In  Chapter  3,  Monte  Carlo  results  are  provided  to 
provide  partial  justification  for  this  simplifying  procedure. 


Rate 

Estimator 


Chapter  3 


Performance  Evaluation  by  Monte 
Carlo  Analysis 


A  key  analysis  issue  is  to  assess  the  performance  of  the  proposed  algorithm.  The  interaction 
between  detection  and  estimation,  however,  makes  performance  analysis  of  this  decision- 
directed  procedure  extremely  complex  using  classical  procedures.  The  difficulty  is  due 
primarily  to  the  dependencies  present  in  the  adaptive  detector  that  are  introduced  by  the 
two-way  coupling  between  detection  and  estimation,  which  are  virtually  impossible  to  treat 
since  the  multivariate  distributions  are  not  available  in  analytic  form.  An  alternative  to 
a  classical  performance  analysis  is  to  conduct  Monte  Carlo  analyses,  and  to  evaluate  the 
performance  of  this  algorithm  on  the  basis  of  first  and  second  sample  moments  of  the  trial 
results.  To  this  end,  we  present  simulation  results  to  demonstrate  the  operation  of  the 
algorithms  presented  in  Chapter  2.  In  addition  to  using  estimates  of  the  prior  to  specify 
the  decision  rule,  it  is  also  necessary  to  estimate  certain  unknown  signal  parameters  such 
as  signal  strength,  which  results  in  a  further  generalization  of  the  likelihood  ratio  test. 

3.1  Signal  Modeling  Assumptions 

The  specification  of  the  parameters  of  the  controlled  experiments  to  be  conducted  under 
this  study  require  that  a  signal  model  be  supplied  to  generate  the  observations.  This 
signal  model  consists  of  a  time  history  of  the  marginal  probabilities  for  each  frequency  bin. 
Furthermore,  it  is  also  necessary  to  specify  the  structure  of  the  Markov  chain  model  used 
to  estimate  the  prior,  and  to  establish  procedures  for  decision-directed  estimation  of  the 
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signal  strength  and  the  noise  variance. 

3.1.1  Observations  Model 

For  all  simulations  done  in  order  to  arrive  at  Monte  Carlo  performance  estimates,  the  size 
of  the  analysis  blocks  is  chosen  to  be  32  samples.  This  means  that  the  size  of  the  DFT 
(FFT)  is  M  —  32  bins  wide,  and  that  the  frequency  detection  is  performed  on  the  bins 
indexed  0  through  16.  Any  frequency  data  in  bins  17  through  31  is  redundant  Larger 
block  sizes  are  possible  to  implement  since  the  computational  burden  grows  linearly  with 
relation  to  the  transform  size.  It  is  easier  to  run  Monte  Carlo  simulations,  however,  when 
the  size  is  relatively  small.  Energy  detection  is  then  done  for  each  bin  via  the  procedures 
outlined  in  Chapter  2  and  the  DTPP  describing  the  detector  decisions  drives  the  estimator 
for  the  a  priori  probability. 

In  general,  we  assume  the  signal  model  presented  in  Chapter  2.  namely:  Let  y(t)  be 
a  received  signal  composed  of  possibly  harmonically  related  sinusoids  contaminated  with 
additive  white  Gaussian  noise,  expressed  as 

p 

y(t)  =  cos(ojjt  +  <f>j)  +  v(t),  (3.1) 

j=i 

where  ujj  is  the  frequency  of  signal  j,  4> }  is  the  phase  angle  for  signal  j.  uniformly  distributed 
from  0  to  2tr,  A:(t)  is  the  time- varying  amplitude  of  signal  j.  and  v(t)  is  additive  Gaussian 
white  noise.  The  a:(t)  are  discrete-time  point  processes  (DTPP's)  describing  the  signal 
presence  at  time  t. 

As  previously  expressed  in  Chapter  2.  it  is  to  be  expected  that,  since  the  resolution  of 
the  FFT  is  not  arbitrarily  fine,  the  sinusoids  given  in  Equation  (3.1)  will  not  necessarily 
be  separated  into  distinct  bins.  Thus,  the  model  which  more  explicitly  fits  the  analysis 
scenario  to  which  we  have  committed  is  given  as  follows: 

L  ~  M 

y(U)  =  Y,  nk[t()Ak{t()cos(ujktt  +  <Pk)  +  v(t()  for  L  -  (3.2) 

k= 0 

where  k  is  the  bin  index  of  the  FFT,  is  the  center  frequency  of  bin  k.  Ok  is  the  phase 
angle  associated  with  bin  k.  A^it-e)  is  the  time-varying  amplitude  of  the  signal  for  bin  k. 
and  v(tf)  is  overall  additive  white  Gaussian  noise.  Since  this  model  differs  from  that  given 
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in  Equation  (3.1).  a  different  DTPP  is  defined  in  order  to  properly  distinguish  between 
the  two  models.  Thus,  the  new  DTPP  modeling  the  signal  presence  is  defined  as  nk(tf ). 


As  before. 


nk(te)  - 


1  if  signal  present  in  bin  k  at  time  t( 
0  otherwise 


Also,  let  us  define  the  detection  DTPP,  as 


Xk(te) 


1  if  signal  detected  in  bin  k  at  time  t( 
0  otherwise 


In  our  processing  model,  we  perform  the  analysis  of  the  data  in  terms  of  blocks;  hence,  the 
time  indices  for  the  various  quantities  are,  of  necessity,  discrete  and  quantized.  Therefore, 
in  order  to  avoid  confusion  with  simple  discrete  time  (the  sampling  instances),  the  subscript 
£  is  added  to  the  time  variable  to  indicate  the  particular  analysis  block  which  is  being 
processed. 

We  also  need  to  define  the  cr-fields  which  influence  the  detector  decisions.  For  a  funda¬ 
mental  description  of  cr-fields,  see  the  explanation  of  martingale  theory.  Suppose  that  Bt( 
is  the  (T-field  generated  by  all  past  decisions  and  all  past  detector  rates.  We  can  express 
this  as 

Btt  =  a(N{s),s  <  t(  A (s),  s  <  t(+i),  (3.3) 

where  X(s)  is  the  true  detection  rate  at  time  s,  and  N(s)  is  the  vector  of  detector  decisions 
at  time  s. 

In  order  to  fix  ideas  concerning  the  probability  structures  in  question,  let  us  define  the 
probability  structure  governing  the  DTPP  is  Equation  (3.1)  in  the  following  terms: 


7 Tj{t()  =  P{ signal  is  present  at  frequency  u?j  at  time  t(). 


The  probability  structure  for  Equation  (3.2)  will  be  defined  to  be: 


A k(tf)  =  P(signal  is  present  in  FFT  bin  k  at  time  tf). 


(3.4) 


(3.5) 


And  define  the  detection  probability  structure  to  be: 

A k(t()  =  P(signal  is  detected  in  FFT  bin  k  at  time  t(\Bl().  (3.6) 
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The  differences  between  these  three  probabilities  are  obvious,  and  yet  the  distinctions  can 
sometimes  be  lost  in  the  mass  of  notation  which  often  must  be  employed  in  attempting 
to  clarify  the  issues  involved.  In  more  succinct  notation,  and  shifting  to  the  point  process 
aspect  of  the  problem,  we  can  define  the  above  three  quantities  in  terms  of  the  rates  of  the 
point  processes  given  in  Equations  (3.1)  and  (3.2)  and  the  quantity  Nk(te),  the  detections. 
Let 

=  1), 

Afc(^)  =  P{nk{t()  =  1),  and  (3.7) 

\k(t()  =  P(Nk(te)  =  l\Bu). 

It  is  important  to  note  that  the  tr-field  Btl  is  not  observed  since,  even  though  we  know 
the  detection  process,  we  will  not,  in  general,  ever  know  A k(tg),  the  true  rate  of  detections. 
Thus,  we  let  Tt(  be  the  cr-field  generated  by  the  detections  as 

Tt'=o{ N(s),  3<te),  (3.S) 

where  N(s)  is  the  vector  of  detector  decisions  at  time  s.  Since  we  do  not  have  the  true 
detection  rates  A we  let  the  estimated  rate  be  A*,^).  We  then  express  the  estimate 
A (tf)  as 

Xk(te)  =  P(Nk(tt)  =  II*;,).  (3.9) 

This  rate  is  the  estimated  a  prion  probability  for  use  in  the  generalized  likelihood  ratio 
test.  Thus,  the  empirical  Bayes  description  of  the  processing  algorithms  [30].  The  results 
in  Equation  3.9  are  based  upon  the  foundation  of  the  Doob  decomposition  as  presented  in 
Section  2.2.2  and  Appendix  A. 

In  the  simulations  then,  we  consider  the  estimation  of  all  parameters  included  in  Equa¬ 
tion  (3.2)  except  u>k  -  this  parameter  is  implicitly  estimated  by  the  detection  of  the  signal 
in  the  kth  bin.  In  the  process  of  estimating  the  various  parameters,  it  will  be  helpful  to 
define  another  set  of  parameters  used  exclusively  in  the  estimation  procedure,  but  which 
relate  back  to  the  parameters  in  the  model  given  in  Equation  (3.2).  Define  these  as  follows: 

•  Ak(tf )  -  an  estimate  of  Ak(t?). 

•  ak  -  an  estimate  of  the  variance  of  the  noise  v(tf). 
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•  Afc(^)  -  an  estimate  of  the  detection  probability  \k(t(). 

In  the  Monte  Carlo  simulations,  a  time  history  for  the  probability  structure  on  A k(t,) 

is  defined  for  k  =  0 . 16  and  i  =  0, . . . .  60.  For  each  Monte  Carlo  run  we  process  960 

time  points.  The  probability  structure  is  generally  the  form  of  a  signal  swept  from  low 
frequency  to  higher  frequency  with  some  harmonic  content.  As  information  needs  to  be 
gathered  for  several  signal-to-noise  ratios,  the  simulations  are  set  to  run  10  Monte  Carlo 
trials  at  each  SNR  as  the  SNR  is  varied  from  about  12  dB  to  —12  dB  in  16  steps.  At  each 
SNR,  the  probability  of  error  is  computed  and  output.  These  data  points  constitute  the 
receiver-operating  characteristics  and  are  plotted  to  show  the  detector  performance.  Also, 
the  estimated  rates  for  the  various  DTPP’s  are  plotted  as  a  probability  surface.  These  plots 
demonstrate  the  algorithm’s  ability  to  track  the  defined  a  prion  probability  structure. 

It  should  be  noted  that  the  SNR  is  measured  with  respect  to  the  time-domain  signal  and 
therefore  does  not  reflect  any  bandwidth  measurement.  Thus,  the  actual  SNR  in  a  given 
bin  is  higher  than  the  SNR  in  the  time-domain  by  a  factor  proportional  to  the  number 
of  points  in  the  analysis  window.  We  can  explain  this  by  the  fact  that  the  noise  power  is 
evenly  distributed  into  the  various  bins  while  the  signals  are  positioned  within  only  one 
or  two  bins.  Thus,  comparisons  with  techniques  reporting  performance  with  respect  to 
SNR/Hz  are  not  directly  possible  for  this  research. 

3.1.2  Markov  Chain  Model 


In  our  estimation  procedure  for  the  rates  of  the  various  DTPP’s,  we  invoke  a  Markov  model 
to  describe  the  time  evolution  of  the  rates  as  described  in  Chapter  2.  Here  we  also  assume 
that  all  FFT  bins  will  be  governed  by  the  identical  Markov  transition  matrices  Q  and 
states  described  as  p.  This  simplifying  assumption  is  not  very  restrictive  since  sufficient 
dimension  is  allowed  to  estimate  the  state  of  the  Markov  chain  for  a  given  bin.  It  is  well  to 
note  that  the  state  estimators  for  the  state  vectors  are  all  done  independently  so  that 
there  is  no  coupling  among  rate  estimators.  This  is  partially  justified  since  in  Appendix 
B.3  we  demonstrate  that  the  noise  is  uncorrelated  between  FFT  bins. 

We  now  define  the  necessary  quantities  required  to  use  Equations  2.12  and  2.14.  For 
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the  simulations  we  will  set  r  =  7,  and  define  the  vector  p  and  the  matrix  Q  as 

PT  =  [P\  Pi  Pz  Pa  Pa  P6  H 

=  (.05  .20  .40  .60  .70  80  .90]. 

and 
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The  values  pj,  j  =  1, . . . ,  7,  are  the  states  of  the  vector  Markov  chain,  and  represent  the 
states  to  which  the  rate  \k(U)  may  transit  as  time  evolves.  Since  the  states  of  the  Markov 
chain  represent  probabilities,  the  only  real  constraint  on  the  states  is  that  0  <  p:  <  1. 
Since,  in  general,  the  estimator  employed  to  obtain  x*,  will  have  non-zero  values  for  all 
elements,  the  relatively  small  dimension  of  the  states  is  not  a  severe  restriction  since  the 
rate  estimate  will  actually  lie  on  the  convex  closure  of  the  states  defined  in  the  pk  vector. 
In  other  words,  A k(t()  is  a  convex  linear  combination  of  the  states  pr 

An  element  of  Qr  represents  the  probability  of  transiting  to  state  j  of  the  Markov 
chain  at  time  given  that  the  state  was  i  at  time  tf  i.e, 

q,j  =  P(xj(t(+l)  |x,(M). 

As  an  illustration  of  this  idea,  note  that  the  strong  diagonal  structure  of  Qr  indicates  that 
given  the  Markov  chain  is  in  state  j  at  time  t /,  it  will  likely  remain  there  at  tf+x.  Therefore, 
the  evolution  of  the  Markov  chain  is  described  by  the  equation  els  given  in  Equation  2.14. 
namely 

Xk(t/+ 1)  =  QTxk(te)  +  u  k(t,). 

This  can  also  be  modified  to  increase  the  off-diagonal  elements  in  order  to  make  the  system 
more  responsive  to  rapid  transitions  if  the  operational  scenario  warrants  this  assumption. 

3.1.3  Estimation  of  Signal  Parameters 

The  signal  strength  for  the  detector  structure  we  employ  is  also  known  as  the  envelope 
magnitude.  We  may  not  assume  that  the  measure  of  signal  strength  in  bin  k.  Ak(tf). 


is  known  a  priori,  so  it  is  important  to  investigate  how  one  might  generate  an  estimate 
Ak(^)  of  Ak{  t().  Probably  the  most  straight-forward  method,  and  one  that  has  previously 
been  used  with  success  in  [IS. 30],  is  to  compute  a  decision-directed  estimate  .4*(  t,  t.  as 
the  empirical  average  of  those  samples  Yk(t/)  for  which  a  detection  in  bin  A-  occurred  -  i.e.. 
when  Sk  occurs).  This  estimator  assumes  the  general  form 

Wit) 

Ak(tf)  =  Ak(tf. i)+  17—^  ' - [|r*(T(-)|  -  ,4fc( (3.10) 

3=  1 

where  the  \jl  is  a  constant  such  that  0  <  ^  <  1.  This  quantity  represents  a  "forgetting 
factor."  which  permits  earlier  estimates  to  be  discounted  in  favor  of  more  recent  data. 
Using  such  a  model,  smooth  changes  in  Akitf)  may  be  tracked. 

In  order  to  estimate  the  variance  of  the  Gaussian  noise.  u(t().  we  employ  a  slightly 
different  decision-directed  approach.  Since  the  variance  we  wish  to  estimate  is  contained 
in  the  raw  FFT  bin  data,  the  bins  useful  for  estimating  the  noise  variance  are  those  in 
which  no  signal  is  detected.  Also,  for  our  signal-plus-noise  model,  the  noise  variance  should 
be  equal  in  all  bins  since  we  assume  that  the  noise  has  a  white  spectrum.  So  we  compute 
an  estimate  noise  variance  for  each  eligible  bin.  then  average  across  all  eligible  bins  to 
arrive  at  the  global  noise  variance  estimate  &2(te).  The  estimator  for  each  bin  is  similar 
to  Equation  (3.10)  and  is  given  as: 

Tfei M  =  +  — — - - - [|h*:(ft)|2  —  (  3. 1 1  i 

X>"-’[1  -  Xk(s)) 

3=  1 

Note  that  the  recursions  presented  in  Equations  (3.10)  and  (3.11)  must  be  initialized 
with  some  a  prion  estimate  for  .-U(O)  and  A2( 0).  Furthermore,  the  ratios 


[l  ~  -%( M] 

X>"-’[l-.Y,(.s)J 


must  be  initialized  to  zero  at  t.,  ~  0  to  ensure  that  the  estimates  for  ,4t(  tf)  and  <r* ( t, 
well-defined,  and  are  equal  to  the  a  prion  values  until  observations  are  obtained. 


3.2  Spectral  Probability  Estimation 

Consider  the  problem  of  estimating  the  probability  of  signal  presence  in  a  given  frequency 
range  when  the  received  time-domain  signal  is  obtained  from  either  a  beamformed  array 
output,  or  the  output  of  a  single  transducer  such  as  a  sonobuoy.  Consequently,  the  time- 
domain  received  signal  is  given  by  Equation  (3.1).  and  after  the  A/-point  FFT  has  been 
performed  on  the  data  for  block  i.  we  have  the  frequency-domain  data  given  in  vector  form 


as  Y (tf)  or 


ho(  tf) 

}\(tf) 


Y  d')  = 


L  J 

where  L  =  Due  to  the  nature  of  the  scenario,  the  above  observations,  whether  in 
the  time  or  frequency  domain,  are  implicitly  functions  of  spatial  positioning,  but  for  the 
present  we  omit  any  reference  to  the  spatial  dependence  of  the  problem. 

Thus,  for  this  proble:  we  directly  apply  the  analysis  presented  in  the  preceding  pages 
to  arrive  at  the  estimati  for  the  marginal  probabilities  necessary  for  each  frequency  bin. 


3.3  Monte  Carlo  Simulation  Results 

3.3.1  Time-varying  Probability  Tracking 

Essentially  two  cases  are  examined  in  the  Monte  Carlo  simulations.  Case  one.  where  the 
only  the  rates  are  estimated  for  use  in  the  GLRT  and  case  two.  where  the  rates  and  the 
amplitudes  are  estimated  for  the  GLRT.  In  all  simulations  for  both  cases,  the  same  true 
tune-varying  a  prion  probability  structure  was  used.  This  is  shown  as  a  three-dimensional 
plot  in  Figure  3.1.  In  this  figure  .  the  blocked  time  index  runs  from  the  back  to  the  front 
of  the  probability  surface,  and  the  FFT  bin  index  runs  from  left  to  right.  Probability  is 
measured  perpendicularly  to  the  plane  according  to  the  scale  in  the  left-hand  corner  of 
the  surface.  Thus  the  figure  illustrates  a  high  probability  of  signal  presence  in  the  lower 
frequencies  during  the  first  few  time  blocks.  The  high-probability  frequencies  then  are 
increased  as  time  progresses.  Ultimately,  the  bin  with  the  highest  probability  is  bin  6  at 
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time  block  CO.  The  probability  surface  also  illustrates  harmonic  structure  in  bins  indexed 
as  integral  multiples  of  the  dominant  frequency  bin.  Some  alternate  ways  of  explaining  the 
plot  are  to  consider  the  representation  of  the  data  as  either  the  collection  of  the  marginals 
presented  as  slices  as  time  evolves,  or  as  a  side-by-side  collection  of  the  time  evolution  of 
each  marginal.  All  of  the  subsequent  three-dimensional  plots  are  constructed  in  the  same 
manner  with  the  requisite  quantity  plotted  against  time  and  frequency. 

In  Figure  3.2  we  show  the  behavior  of  the  rate  estimator  for  the  case  where  signal 
amplitude  and  noise  variance  are  assumed  known.  This  clearly  illustrates  the  tracking 
performance  of  the  algorithm  when  the  probability  structure  is  time- varying.  This  estimate 
was  performed  at  a  signal-to- noise  ratio  of  12  dB. 

Figure  3.3  is  the  average  of  the  rate  estimates  done  for  various  Monte  Carlo  simulations 
and  is  much  smoother  than  the  single  trial  estimate  shown  in  Figure  3.2.  In  this  figure, 
the  true  prior  probability  structure  is  more  obvious  due  to  the  smoothing  of  the  average. 
Thus,  it  is  evident  that,  in  the  mean,  the  algorithm  is  able  to  track  time-varying  probability 
structures.  The  confidence  in  the  estimate  can  be  expressed  by  examining  the  variance 
of  the  estimator  as  well  as  the  mean-square  error  for  the  estimator.  These  quantities  are 
plotted  in  Figures  3.4  and  3.5  respectively.  We  see  from  Figure  3.4  that  the  variance 
computed  from  the  Monte  Carlo  procedure  is  never  greater  than  0.0487.  and  generally 
is  between  0.01  to  0.02.  In  Figure  3.5,  we  see  that  the  mean-square  error  is  somewhat 
larger  along  the  high-probability  frequency  track  than  in  the  low-probability  regions  of  the 
spectrum.  That  is.  the  large  values  range  from  0.5  to  0.6,  but  there  are  only  about  five 
points  in  the  plot  where  these  values  are  present.  For  most  of  the  rest  of  the  spectrum, 
the  mean-square  error  is  less  than  0.1.  The  reason  that  the  errors  are  large  in  these  high- 
probability  areas  is  because  the  estimator  is  adaptive.  In  other  words,  the  estimates  for 
the  various  probabilities  are  causal  estimates  performed  in  real  time;  no  smoothing  is  being 
performed.  Notice  that  the  points  at  which  the  probabilities  shift  to  the  next  bin  are  the 
points  at  which  the  mean-square  error  is  large.  After  the  shift,  the  probability  estimator 
adapts  and  the  mean-square  error  gradually  decreases  to  a  somewhat  lower  level.  Both  the 
variance  and  mean-square  error  results  were  to  be  expected  from  similar  results  obtained 
in  [34]. 
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Figure  3.4:  Monte  Carlo  Variance  on  the  Estimated  A  Prion  Probability  Str 

(SXR-12dB). 
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Figure  3.6:  Estimated  .4  Prion  Probability  Structure  (SXR=6dB! 


Next,  we  show  results  similar  to  Figures  3.2  -  3.5  for  a  signal-to-noise  ratio  of  6  dB. 
These  results  are  still  for  the  case  where  only  the  rate  is  being  estimated  and  are  presented 
in  Figures  3.6  -  3.9.  As  is  to  be  expected.  Mie  detection  probability,  while  still  quite 
adequate,  is  a  little  less  responsive  to  tracking  the  time-varying  probability  in  Figure  3.6; 
and  the  Monte  Carlo  averages  in  Figure  3.7  are  not  quite  as  accurate  a  representation  of  the 
true  a  priori  probabilities  as  was  presented  for  the  12  dB  SNR  in  Figure  3.3.  The  variance 
and  mean-square  error  presented  in  Figures  3.8  and  3.9  respectively  are  also  greater  than 
in  the  plots  for  the  12  dB  simulations. 

Now.  we  examine  case  two,  where  the  rates  and  the  bin  signal  amplitudes  are  estimated 


3-15 


^  A  A  ^  &  -  ti k.V-  ^.Vk'fn.Vi.V  wV  V.  Li**  ■  wS.  A  A  u*k  A  1A1  uV **  JV  JV  u*V  A  ..V  -JV .*V V  A  JV  A «V  A  A  L*b 


'**o 

rie;ure  3.8:  Monte  Carlo  Variance  on  the  Estimated  A  Prion  Probability  Structure 

SXR=6dB). 


3-17 


. 't.  •ruv\vvv\^w'«vv\.v'..  wv_ v  -vcl*  ,wj<  7*  ?j<  ’■>  ’uttv* 


r  W’TT^T  *  ^  v  "*  ▼  ^  '»~T~ 

1  ■ »  k  »  .  «  *  • 


— I-'T  -  T-  v  ^JVWVVVWirr  V*  r 


i  \ 
tv 


Figure  3.10:  Estimated  A  Priori  Probability  Structure  (SNR=12dB). 

and  used  in  the  GLRT.  As  can  be  seen  from  the  plots  in  Figures  3.10  and  3.11.  the  estimator 
is  still  performing  quite  well  at  a  signal-to-noise-ratio  of  12  dB.  However,  upon  examination 
of  the  variance  plot,  we  see  that  the  estimator  variance  for  this  case  actually  has  a  lower 


absolute  maximum  than  the  variance  for  case  one.  Nevertheless,  the  overall  varian 


ce  m'i'IIs 


to  be  about  the  same  as  in  case  one.  The  mean-square  error  for  the  case  two  estimator  > 
slightly  greater  than  that  for  case  one. 

When  the  signal-to  noise-ratio  is  6dB.  Figures  3.14  and  3.15  show  a  significant  fa  he 
alarm  rate  in  FFT  bin  1.  Since  Figure  3.16  shows  a  high  estimator  variance  in  bin  .  ;• 
is  likely  that  the  false  alarm  condition  shown  in  Figure  3.14  did  not  occur  on  all  Mono 


3.13.  Mean-square  Error  for  the  Estimated  A  Prion  Probability  Structure 
12dB). 
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Figure  3.14:  Estimated  A  Prion  Probability  Structure  (SNR=6dB). 


Carlo  simulations.  This  high  false  alarm  probability  in  bin  1  can  be  explained  as  a  runaway 
condition  probably  due  to  the  inaccurate  estimation  of  the  bin  1  signal  amplitude.  With  the 
exception  of  bin  1.  however,  the  estimator  variance  shown  in  Figure  3.16  is  only  somewhat 
larger  than  the  variance  for  the  12  dB  SNR  simulations.  Even  given  the  rather  erroneous 
estimation  of  the  probabilities,  the  mean-square  error  plot  given  in  Figure  3.17  is  not 
dominated  by  the  problems  in  bin  1.  but  rather  shows  the  characteristic  peaks  alone  the 
high-probability  trace. 
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Figure  3.18:  Receiver  Operating  Characteristics  -  Known  Amplitude.  Estimated  A  Prion 
Probabilities. 

3.3.2  Receiver  Operating  Characteristics 

Whenever  detection  performance  is  evaluated,  the  common  performance  evaluation  is  done 
using  the  error  probabilities  at  various  signal-to-noise  ratios.  Plots  generated  using  such  a 
method  are  referred  to  as  receiver  operating  characteristic  (ROC)  curves.  In  Figure  3.18 
we  give  these  curves  for  the  case  where  only  the  a  priori  probabilities  are  estimated  for 
the  various  FFT  bins.  The  line  marked  by  dots  is  the  ROC  curve  for  the  empirical  Bayes 
procedure,  and  the  line  marked  by  the  crosses  is  the  ROC  curve  for  the  true  Bayes  detection 
procedure.  An  unexpected  feature  of  these  curves  is  that  they  cross  at  approximately  the 
6  dB  point  and  are  identical  at  the  12  dB  point.  The  difference  at  the  6  dB  point  would  be 
approximately  .02  percent,  a  very  small  advantage  for  the  empirical  Bayes  procedure.  From 
Robbins  work  [22].  we  do  expect  that  the  empirical  Bayes  and  the  true  Bayes  approaches 
should  converge  at  high  signal-to-noise  ratios.  This  is  obviously  tin’  case  here. 

Similarly,  the  curves  shown  in  Figure  3.19  are  given  for  the  case  where  the  signal  am- 


8  10  12  14 


-12  -10  -8  -6  -4  -2  0  2  4  6 

Signal-to-Noise  Ratio  in  dB. 

Figure  3.19:  Receiver  Operating  Characteristics  -  Estimated  Amplitude  and  A  Priori 
Probabilities. 

plitudes  is  estimated  as  well  as  the  a  priori  probabilities.  In  this  plot,  the  error  probability 
for  the  empirical  Bayes  procedure  is  slightly  lower  than  the  true  Bayes  only  at  the  12  dB 
point.  Again,  this  is  a  very  slight  difference  and  is  possibly  an  anomaly  in  the  data  which 
would  be  remedied  if  a  larger  number  of  Monte  Carlo  trials  were  conducted  and  averaged. 
From  these  two  graphs  then,  we  see  that  when  the  signal  bin  amplitude  must  be  computed 
no  serious  degradation  of  the  algorithm  performance  occurs. 


Bibliography 

[1]  W.  S.  Burdic.  Underwater  Acoustic  System  Analysis.  Prentice-Hall,  1984. 

[2]  J.  A.  Cadzow,  Y.  S.  Kim,  D.  C.  Shiue,  Y.  Sun,  and  G.  Xu.  Resolution  of  coherent 
signals  using  a  linear  array.  In  Proceedings  of  IEEE  ICASSP,  198 7,  Dallas,  Texas, 
April  1987. 

[3]  T.  A.  C.  M.  Claasen  and  W.  F.  G.  Mecklenbrauker.  The  Wigner  distribution  -  a  tool 
for  time-frequency  signal  analysis,  Part  III:  relations  with  other  time-frequency  signal 
transformation.  Philips  Journal  of  Research ,  35:372-389,  1980. 

[4]  T.  A.  C.  M.  Claasen  and  W.  F.  G.  Mecklenbrauker.  The  Wigner  distribution  -  a  tool 
for  time-frequency  signal  analysis.  Part  II:  discrete-time  signals.  Philips  Journal  of 
Research ,  35:276-300.  1980. 

[5]  T.  A.  C.  M.  Claasen  and  \\  .  F.  G.  Mecklenbrauker.  The  Wigner  distribution  -  a  tool 
for  time-frequency  signal  analysis.  Part  I:  continuous-time  signals.  Philips  Journal  of 
Research ,  35:217-250,  1980. 

[6]  L.  D.  Davisson  and  S.  C.  Schwartz.  Analysis  of  a  decision-directed  receiver  with 
unknown  priors.  IEEE  Trans.  Inform  Theory.  IT- 16( 3 ):270  -276.  May  1970. 

[7]  J.L.  Doob.  Stochastic  Processes.  John  Wiley  and  Sons,  Inc..  1953. 

[8]  \\ .  D.  Gregg  and  J.  C.  Hancock.  An  optimum  decision-directed  scheme  for  Gaussian 
mixtures.  IEEE  Trans.  Inform.  Theory ,  IT- 14(  3  ):451  -461 .  May  1968. 

[9]  J.  K.  Hammond  and  R.  F.  Harrison.  \\  igner-ville  and  evolutionary  spectra  for  co- 
variance  equivalent  nonstaionary  random  processes.  In  Proceedings  of  IEEE  ICASSP. 


V’rVOC'V.’<UrX  wVrV  •-  KSS.'-Sr  V. 


| 

!• 


W 


■.'.'_*,',.'’.l,.".V.V.V^ 


rjvvwvv’/v-.'vv rm-v\-v  ■„-» y-w y»  ir»  y-"^  ->  •  >'  ->  -J- 


•J«->  --•  v  -■ 


i 


i3(95,  Tampa,  Florida,  April  1985. 

[10]  A.  G.  Jaffer  and  S.  C.  Gupta.  Recursive  Bayesian  estimation  with  uncertain  observa¬ 
tion.  IEEE  Trans.  Inform.  Theory ,  IT-17(5):614-616,  September  1971. 

[11]  D.  L.  Jones  and  T.  W.  Parks.  A  high  resolution  data-adaptive  time-frequency  repre¬ 
sentation.  In  Proceedings  of  IEEE  ICASSP,  1987 ,  Dallas,  Texas,  April  1987. 


& 

I 

‘*N, 

& 

V 

n 


[12]  A.  Katopis  and  S.  C.  Schwartz.  Decision-directed  learning  using  stochastic  approxi¬ 
mation.  In  Proc.  3rd  Ann.  Pittsburgh  Conf.  Modeling  and  Simulation,  pages  473-481. 
Univ.  Pittsburgh,  Pittsburgh,  PA,  April  1972. 

[13]  D.  Kazakos  and  L.  D.  Davisson.  An  improved  decision-directed  detector.  IEEE  Trans. 
Inform.  Theory ,  IT-26(  1):113-116.  January  1980. 

[14]  N.  E.  Nahi.  Optimal  recursive  estimation  with  uncertain  observation.  IEEE  Trans. 
Inform.  Theory ,  IT-15(4):457-462,  July  1969. 

[15]  J.  Nevman.  Two  breakthroughs  in  the  theory  of  statistical  decision  making.  Rev. 
Inst.  Internat.  Statist.,  30:11-27,  1962. 

[16]  J.  H.  Painter  and  S.  C.  Gupta.  Recursive  ideal  observer  detection  of  known  M- ary 
signals  in  multiplicative  and  additive  noise.  IEEE  Trans.  Commun..  COM-21(S):948- 
953,  August  1973. 

[17]  E.  A.  Patrick  and  J.  P.  Costello.  Asymptotic  probability  of  error  using  two  decision- 
directed  estimators  for  two  unknown  mean  vectors.  IEEE  Trans.  Inform.  Theory, 
IT-14(1):160-162,  January  1968. 

[IS]  E.  A.  Patrick,  J.  P.  Costello,  and  F.  C.  Monds.  Decision-directed  estimation  of  a 
two-class  decision  boundary.  IEEE  Trans.  Computers.  C -  1 9(  3 ):  1 97  205.  March  1970. 

[19]  F.  Peyrin  and  R.  Prost.  A  unified  definition  for  the  discrete  time,  discrete-frequency, 
and  discrete-time/frequency  Wigner  distributions  IEEE  Trans  on  Acoustics.  Speech,, 
and  Signal  Processing ,  ASSP-34(  4):858  867.  August  1986 


Bib-2 


«  »  i 
* 


*v 


•  v. 


% 


■A 


A 


'■:A 


y 

*  m  ** 


.V.5 


>:• 


N^l 


ft? 


8 


•  .y 


s' 


■■-A 


[20]  J.  G.  Proakis,  P.  R.  Drouilhet.  Jr.,  and  R.  Price.  Performance  of  coherent  detection 
systems  using  decision-directed  channel  measurement.  IEEE  Trans.  Commun.  Syst.. 
CS-12(l):54-63.  March  1964. 

[21]  H.  Robbins.  Asymptotically  subminimax  solutions  of  compound  statistical  decision 
problems.  In  Proc.  Second  Berkley  Symposium  on  Mathematical  Statistics  and  Prob¬ 
ability ,  pages  131-148,  University  of  California  Press,  1950. 

[22]  H.  Robbins.  An  empirical  Bayes  approach  to  statistics.  In  Proc.  Third  Berkley  Sym¬ 
posium  on  Mathematical  Statistics  and  Probability ,  pages  157-164,  University  of  Cal¬ 
ifornia  Press,  1955. 

[23]  Y.  Rosen  and  B.  Porat.  Arma  parameter  estimation  based  on  sample  covariances  for 
missing  data.  In  Proceedings  of  IEEE  ICASSP,  1986.  Tokyo.  Japan,  April  1986. 

[24]  E.  Samuel.  Asymptotic  solutions  of  sequential  compound  decision  problem.  Ann. 
Math.  Statist .,  34:1079-1094.  1963. 

[25]  R.  O.  Schmidt.  A  Signal  Subspace  Approach  to  Multiple  Emitter  Location  and  Spectral 
Estimation.  PhD  thesis,  Stanford  University,  1981. 

[26]  S.  C.  Schwartz  and  L.  D.  Davisson.  Analysis  of  a  decision-directed  receiver  for  a 
Markov  sequence  with  unknown  transition  probabilities.  Problemy  Peredachi  Infor- 
matsn.  6(2):92-86.  April-June  1970. 

[27]  S.  C.  Schwartz  and  A.  Katopis.  Modified  stochastic  approximation  to  enhance  un¬ 
supervised  learning.  In  Proc.  19 77  Conf.  Decision  and  Control,  pages  1067-1079. 
1977. 

[28]  H.  J.  Scudder,  III.  Probability  of  error  of  some  adaptive  pattern-recognition  machines. 
IEEE  Trans.  Inform  Theory.  IT -11(3 ) :363— 371 .  July  1965. 

[29]  A.  Segall.  Recursive  estimation  from  discrete-time  point  processes.  IEEE  Trans. 
Inform.  Theory.  IT-22(4):422 -431.  July  1976. 


Bib-3 


V 

[30]  W.  C.  Stirling.  Simultaneous  Jump  Excitation  Modeling  and  System  P urometer  Esti¬ 
mation.  PhD  thesis,  Stanford  University,  1983. 

[31]  W.  C.  Stirling.  Simultaneous  system  identification  and  decision-directed  detection 
and  estimation  of  jump  inputs  to  linear  systems.  IEEE  Trans.  Automatic  Control. 
AC-32(l):86-89,  January  1987. 

[32]  W.  C.  Stirling  and  M.  Morf.  A  new  decision-directed  algorithm  for  nonstationarv 
priors.  IEEE  Trans.  Automatic  Control ,  AC-29(  10):928-930,  October  1984. 

[33]  W.  C.  Stirling  and  A.  L.  Swindlehurst.  Decision-directed  multivariate  empirical  Bayes 
classification  with  nonstationarv  priors.  IEEE  Trans.  Pattern  Analysis  and  Machine 
Intelligence ,  Sept.  1987.  (To  Appear). 

[34]  W.  C.  Stirling  and  A.  L.  Swindleurst.  Decision- Directed  Detection  of  Spatio-Temporal 
Sources.  Final  Report  TR  S101-86.1,  Brigham  Young  University.  Provo.  Utah.  August 
1986. 

[35]  A.  L.  Swindlehurst  and  W.  C.  Stirling.  An  adaptive  empirical  Bayes  decision-directed 
detector.  In  Proceedings  of  20th  Annual  Asilomar  Conference  on  Signals.  Systems, 
and  Computers ,  Pacific  Grove.  California,  November  1986. 


[36]  A.  D.  Whalen.  Detection  of  Signals  in  Noise.  Academic  Press,  1971. 


Appendix  A 

Key  Results  from  Martingale 
Theory 


The  most  common  applications  of  stochastic  process  theory  in  engineering  deals  with 
linear  estimation  (e.g.,  Wiener  and  Kalman  filters),  which  rely  heavily  on  the  notion 
of  uncorrelation  and  white  noise.  For  the  purposes  of  the  analysis  in  this  report, 
however,  the  estimation  problems  are  nonlinear,  and  these  notions  are  replaced  as 
fundamental  concepts  by  the  martingale  property.  Consequently,  it  may  be  useful  to 
provide  some  background  on  martingale  theory  which  will  render  the  results  of  this 
paper  more  understandable. 

A.l  Background 

Definition:  A  a-field  is  a  collection  of  subsets,  A ,  of  an  event  space,  S7.  such  that 

(a)  For  every  A  £  A ,  we  have  the  complement.  A  £  A. 

(b)  if  At.  A'2 .  An, . . .,  is  a  countable  sequence  of  elements  of  A,  then  U  An  £  A. 

(c)  0  £  A. 

The  elements  of  A  are  termed  events-,  the  set  f l  is  the  sure  event,  and  event  0  is  the 
impossible  event. 

Definition:  A  probability  space  is  a  triple,  {fi,  A.  P]  where  Q  is  the  event  space.  .4 
is  a  <7-field,  and  P  is  a  probability  measure  on  A.  i.e..  P  satisfies 


(a)  P{A}  >  0,  for  all  -4  £  A. 

(b)  P{Q}  =  1 

(c)  If  {.4n}  is  a  countable  collection  of  disjoint  events,  then  P{U^LjAn}  =  Xj^LiPi-4,, } 

Definition:  A  random  variable  X  is  a  real-valued  function  whose  domain  is  Q  and 
which  is  «4-measurable,  i.e.,  for  every  real  number  x,  the  set  {lj  £  ft|X(u>)  <  x}  £  A. 

Definition:  The  cr-field  generated  by  the  set  of  random  variables  {Xt,<  £  T }  is  the 
smallest  cr-field  with  respect  to  which  each  element  of  {Xt,  t  £  T}  is  measurable. 
This  cr-field  is  denoted  cr{Xt,  t  £  T}.  The  physical  meaning  of  the  <7-field  Bj  — 
cr{Xt,  t  6  T }  is  that  Bj  represents  all  of  the  information  contained  in  or  derived 
from  the  collection  of  random  variables  {Xt  t  £  T}. 

Definition:  Let  X  be  a  random  variable  on  P}  with  finite  expectation,  and 

let  B  be  a  sub  cr-field  of  A.  The  conditional  expectation  of  X  with  respect  to  B. 
denoted  EBX  or  E(X\B)  is  a  ^-measurable  random  variable,  uniquely  determined 
except  over  a  5-measurable  event  of  probability  zero,  which  satisfies 

/  XdP  =  /  EBXdP 

Jb  Jb 

for  all  B  £  B.  Although  it  will  not  be  proven  here,  it  is  a  standard  result  of  mathe¬ 
matical  probability  theory  (a  consequence  of  the  Radon-Nikodym  theorem)  that  the 
existence  and  uniqueness  (up  to  ZS-measurable  events  of  probability  zero)  of  such  a 
function  EB X  is  assured.  In  the  sequel,  conditions  that  hold  except  possible  on  a  set 
of  probability  zero  will  be  said  to  hold  almost  surely,  or  a.s.  Some  basic  properties 
of  conditional  expectations  may  be  summarized  as 

(a)  EB X  >  0  if  X  >  0  a.s.;  E6 X  =  0  if  X  =  0  a.s. 

(b)  E(  EBX)  -  E{  X)  a.s.;  £8(1)  -  1  a.s. 

(c)  Eb(cX)  =  cEB X  if  c  £  R. 

(d)  EB(  A' |  +  X2)  —  EBXi  4-  Eb X-)  if  both  expectations  exist. 

(e)  If  A'  is  5-measurable,  then  EB X  =  X  a.s.  and.  more  generally,  for  every  random 
variable  Y  on  {fi,  B,  P},  we  have  EB{ XY)  =  X  EBY . 
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(f)  If  Bx  €  B2,  then  EB'(EB*X)  =  EB'X  =  EB*(EB'  X). 


If  a  random  variable  .V  is  measurable  with  respect  to  a  (7-field  B.  this  fact  is  denoted 
by  the  notation  A'  6  B. 

Definition:  Let  {Bt,  t  =  0, 1, . . .}  be  an  increasing  family  of  a-fields  and  let  {Xt,  t  = 
0, 1,. .  .  ,  }  be  a  set  of  random  variables  such  that  A'(  6  Bt-  Then  {A'}  is  said  to  be 
adapted  to  {5}.  The  family  of  random  variables  {A'}  is  called  a  martingale  with 
respect  to  {S}  if 

(a)  Xt  G  Bt 

(b)  EB,Xt  =  X,  for  s  <  t. 

A  sub-martingale  is  defined  as  above  with  the  exception  that  2)  is  replaced  by 

EB‘ Xt  >  X,. 

A  martingale  difference ,  or  MD,  process  {z}  is  formed  from  a  martingale  process 
{Ar}  by  defining  xt  =  A^  -  Xt-\.  Thus^ 

EB’~'xt  =  E*-'Xt  -  Xt-t  =  Ar(_j  -  A(_i  =  0. 

It  is  useful  to  characterize  the  MD  property  as  being  intermediate  between  the  proper¬ 
ties  of  independence  and  uncorrelation,  since  every  two  independent  random  variables 
have  the  MD  property  with  respect  to  each  other  and.  in  turn,  every  two  random 
variables  that  have  the  MD  property  are  uncorrelated. 

Definition:  For  a  process  (possibly  vector- valued)  {X}  adapted  to  {5},  the  condi¬ 
tional  variance  (X.  X),  sequence  is 

(X,X)(  =  EBl-lX(t)XT(t). 

For  any  two  processes  {X}  and  {Y}  the  conditional  covariance  (X.  Y)(  is 

(X.Y),  =  EB-'X(t)YT(t). 

Definition:  The  sequence  {b}  of  random  variables  is  said  to  be  {B} -predictable  if 
bt  G  Bt- 1  for  all  <  =  0.1,....  If  {6}  is  {£?} -predictable  and  { r }  is  a  MD  with  respect 
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to  {£>},  we  define  the  process  {6  •  r>}  by  bt  ■  vt  =  btvt,  the  product.  Then  {b  ■  r}  is 
a  MD  sequence  with  respect  to  {5}.  This  sequence  is  termed  the  MD  transform  of 
M  bv  {b}. 

Definition:  For  any  sequence  of  random  variables  {yu  t  =  0,1,...}.  let  Ft  — 
cr{  y3,  s  =  0, 1, . . .  t).  Then  the  sequence  vt  —  yt  —  yt  is  the  general  innovations 
of  the  process  {y}. 

The  development  of  the  discrete-time  point  process  estimator  used  in  this  report 
relies  two  basic  properties  that  are  central  to  nonlinear  estimation  theory.  These 
two  properties  are:  1)  the  innovations  theorem,  and  2)  the  representation  theorem. 
These  results  are  stated  without  proof  for  the  discrete-time  case. 

Innovations  Theorem:  The  general  innovations  process  is  a  martingale  that  is 
equivalent  to  the  observed  process,  i.e., 

Ft  =  <  t)  =  <r{i/s,s  <  t}. 

Representation  Theorem:  Every  MD  sequence  {u>}  with  respect  to  the  {F}  can 
be  represented  as  a  MD  transform  of  the  innovations  {u},  i.e., 

wt  -  btvt 

where  {6}  is  an  {F}  predictable  process. 

A. 2  Discrete-Time  Point  Processes 

A  discrete-time  point  process  (DTPP)  {.V(f),  t  =  0. 1 _ }  is  a  binary  {0.  1 }  sequence 

describing  the  occurrences  of  some  type  of  (possibly  vector- valued)  event.  Thus. 
Fit)  =  1  means  that  the  event  occurs  at  time  t.  and  .V (t)  ~  0  means  that  there 
is  no  occurrence  of  the  event  at  time  t.  The  simplest  example  is  when  {-V}  is  a 
Bernoulli  sequence,  i.e.,  a  sequence  of  independent  random  variables  with  P{F(t)  = 
1}  =  1  -  P{F(t)  =  0}  =  A (t).  The  quantity  A (t)  is  the  rate  of  X(t).  and  may  in 
general  be  time- varying.  In  many  applications  the  occurrences  will  not  be  mutuallv 


independent,  but  the  probability  of  occurrence  at  a  given  time  will  be  affected  by 
previous  occurrences  and  perhaps  by  some  other  related  process.  x(f).  in  which  case 
the  rate  of  the  process  will  be  affected  by  the  past  history  of  {x}  as  well  as  by  {-V}. 

A. 2.1  Doob  Decomposition 

We  present  a  fundamental  theorem  due  to  Doob  [7], 

Doob  Decomposition:  For  an  arbitrary  sequence  {y}  adapted  to  a  family  {Z?}  of 
a- fields,  define 

A  (t)  =  EB'~'y(t) 

and 

w(t)  =  y(t)  -  EB,^y{t). 

Then  the  following  properties  hold: 

y(t)  =  A  ft)  +  w(t) 

(a)  {A}  is  {5} -predictable  and  {w}  is  a  MD  sequence  with  respect  to  {£>}; 

(b)  the  above  decomposition  is  unique: 

(c)  if  {?/}  is  a  {£>} -submartingale  difference  (subMD)  sequence,  i.e..  if 

EB‘~'y(t)  >  0. 

then  A (f)  is  positive  for  all  t. 

Proof:  Properties  1)  and  3)  are  trivial.  To  prove  2).  suppose 

y(t)  =  A  '(t)  +  w'(t) 

where  {A'}  is  {5} -predictable  and  {u/}  is  a  {£5}  MD  sequence.  Then 
0  =  EBt~'w'(t)  =  Eb'~'  y(t)  -  A \t)  =  Aft)  -  A '(f) 

which  proves  2). 

For  point  processes, 

Aft)  =  Aft)  +  w(t) 

is  the  Doob  decomposition  of  {.V}  with  respect  to  {£}. 


\.r 4»Tl’'VTi£TiiFV-w'-  r<riyi.r 
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A. 2. 2  Estimation  from  Discrete-Time  Point  Processes 

It  is  useful  to  outline  the  development  of  the  discrete-time  point  process  estima¬ 
tor  used  in  this  analysis.  This  development  follows  [29].  Suppose  the  rate  of  the 
DTPP  {Ar}  (A (t))  may  be  characterized  as  a  finite-state  Markov  chain,  with  states 
Pi(t),. . .  ,pT{t).  Define  the  vector  x(t)  with  element 

_  /  1  if  A(*)  =  Pi(t) 

1  |  0  otherwise 

and  the  probability  transition  matrix  Q (t)  with  elements 

<?qP)  =  +  1)  =  1|  n{t)  =  !}• 

Define  the  cr-field 

Bt  =  a{N(s),s  <  t ,  x(s),s  <  t  +  1}. 

The  r-dimensional  vector  process  xp)  trivially  obeys  the  relation 

x(<+  1)  =  EB‘-'x(t+  1)  +  [xp  +  1)  -  EB'~'x(t  +  1). 

It  can  easily  be  seen  that  the  process  EBt~'x(t  +  1)  is  {5} -predictable,  and  that 

up)  =  xp  +  1)  -  EB‘-'x(t  +  1) 

is  a  {£>} -MD  sequence.  Thus. 

x(t  +  1)  =  EB,~‘x(t  +  1)  +  up)  (A.l) 

is  analogous  to  the  classical  signal-plus-noise  model.  It  can  be  seen  from  construc¬ 
tion.  however,  that  this  decomposition  is  fundamentally  different  from  the  classical 
situation,  and  that  the  process  {u}  is  not  an  independent  process. 

We  may  evaluate  EB’~ 1  x(  t  + 1 )  for  the  Markov  chain  model  as  follows.  Since  x(  t )  com¬ 
pletely  characterizes  the  expected  behavior  of  x(t  +  1 )  (due  to  the  Markov  structure) 
we  haw 

r 

EB,-'x,(t  +  1 )  =  E[ii(t  +  l)|x(f)]  =  P{x,(t  +  1)  =  l|xp)}  =  ^  <1  j,(Oxj(t). 

j= i 
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Thus, 


EB,~lx(t  +  1)  =  Q  T(t)x(t), 


and  we  have  the  representation 


We  mav  define 


x(t  +  1)  =  Q T{t)x{t)  -I-  u (t). 


w(t)  =  N(t)  —  A  (t) 


where  A (t)  =  EB'~'N(t).  The  process  {w}  is  a  {5}  MD  process.  Under  the  Markov 
model,  the  components  of  x{t)  are 


x,{t)  = 


1  if  A (t)  =  p,(t) 
0  otherwise 


Thus,  we  mav  write 


where 


X(t)  =  pT(t)x(t)  +  rv(t) 


P(t)  = 


pj{t) 


Pr(t) 


We  are  interested  in  the  conditional  expectation  of  x{t  4-  1)  given  Tt.  To  obtain  this 
representation,  we  first  form  the  process 

H(t)  =  E*x(t  +  1)  -  E*-1  EB-lx(t.  +  1). 

and  note  that  this  quantity  is  a  {jT}-MD.  Since  JT(_,  c  Gt-i  and  using  (A. 2)  we 


obtain 


fi(t)  =  E*'x(t  +  1)  -  Q T{t)EJr,~'x(t). 


We  also  note  that  the  process 


/(/)  =  N(t.)  -  E*-'  EB'~'  X(t)  =  N(t)  -  pT(t)EJr-'x(t) 


- t-s-j-’j-:.-  v.y.y.y '--'--.y 


vH'-..-v_-  v  ^^*'  .  '  '’•  '"V'VrV'V  'T.  v'>Tr,v.V.  'r.V,V.V,.v, 


is  a  {J^J-MD.  where  we  have  used  (A. 4)  and  the  fact  that  {ic}  is  a  {£>}-MD.  The 
representation  theorem  states  that  we  can  express  {p}  in  terms  of  {u}  as 


H{t)  -  B (t)u(t) 


( A .  6 ) 


for  some  {.^-predictable  (matrix)  sequence  { B } .  The  matrix  B( t)  can  be  obtained 
by  computing  the  conditional  covariance  process  with  respect  to  {E},  which  implies 


which  yields 


(/z,i/)(  =  B (t)(is.  u)t 


B(f)  =  (p.  v)t[{ v,v)t\  \ 


( A .  7 ) 


Thus,  from  (A. 5),  (A. 6),  and  (A. 7)  we  obtain 

Ejr'x{t  +  1)  =  Q T(t)Ejr,-'x(t)  4-  (p.i/)t[(i/,  i/)t]_V(f), 
or,  using  the  notation  y(t\s)  —  E^’yit),  for  any  process  y,  we  obtain 
x(t  +  l|i)  =  Qr(t)x(t|t  -  1)  +  (p, !/),[( v.v)t]~'v{t). 


Calculation  of  Conditional  Covariance 

The  covariance  matrix  may  be  expressed  as 

E*-'[tJi(t)u(t)]  =  E {E*[x(t  +  l)]i/(<)  -  E*-[x(f  +  l)i/(t)]} 

and  since  E^-'xit  +  1)  6  Et-\  and  u(t)  is  a  {^"}-MD,  the  second  term  of  this 
expression  on  the  right-hand  side  vanishes.  Thus,  we  may  write 
Er'-'[n(t)v{t)\  =  £*-‘£*[x(<  +  1M#)]  -Q7(l)^-'[x(f)i/(f)] 

=  et —  {fQr(t)x(<)  +  u(t)][xr(np(t)+  w(t)-kT(t\t  -  npm]} 

=  Er,~'  |q r(t)x(t)xT{t)p(t)  +  QT(t)x(t)w{t)  -  Q T(t)x(t)xTlt\t  -  1  p 
u(  t)xT(t) p(  t)  +  u {t)w(t)  -  u(t)xT{t\t  -  l)p(t)} 
where  we  have  used  (A. 4),  (2.20),  and  (2.15).  Noting  that  x(t)  €  £>,_).  x(t\t  —  1  j. 
and  u(t)  and  w(t)  are  {ZS}-\ID  processes,  tills  expression  simplifies  to 

E*-'[ti(t)v(t)]  =  ETt~'  {Q T(t.)[x(t)xr(t)  -  x(t.)xT(t\t  -  l)]p(/)  4-  u (t)w(t)}  (A. 9) 


Solving  for  E^-1  [u(t)u’(t)]  yields 

E^-'iudMt)}  =  Et‘-'  {[x(#  +  1)  -  Q (t)Tx(t)}[X(t)  -  xT(t\t  -  l)p(f)]}  . 

Since  Et_\  C  0f_,  we  may  write 

Ejr‘~1[u(t)w(t)}  =  E^'-'E6'-1  |x( t  +  l)-V(f)  -  x(t  +  l)xr(t|t  -  l)p(t) 

-Q T(t.)x(t)N(t)  +  QT(t)x(t)xT{t\t  -  l)p(f)} 

where  we  have  multiplied  out  the  cross  products.  Substituting  (A. 3)  and  (A. 4)  yields 

E jr,-i  (u(t)uj(t)]  =  E^'-'E6'-1  [x(t  +  l)JV(i)  -  [Q T(t)x(t)  +  u(;l)]x:r(t|!'  -  1  )p(t) 

-  QT(t)x(t)[xT(t)p(t)  +  u?(f)]  +  QT(t)x(t)xT{t\t  -  1  )p(<)} 

We  note  that  since  x(t|t  —  1)  €  Bt-\  and  u(t)  is  a  we  have 

£'Sf-,[u(tI)x:r(t|!1  -  l)p(<)]  =  0- 

Similarly,  since  x{t)  6  Bt- 1  and  w(t)  is  a  {5}-MD.  we  also  have 

EB,~'  [Qr(<)x(<)u>(<)]  =  0. 

Thus,  after  simplification,  we  have 

£^-'[u(*M0j  =  E^'-'E8'-'  {x(#  +  l)X(t)  -  Q T(t)x(t)xT(t)p(t)}  . 
Substituting  this  expression  into  (A. 9)  yields,  after  straightforward  manipulation. 
EF,-'[fi(t)v{t)\  =  E*~l[x(t  +  1  )-V(/)]  -  QT(t)x(t\t  -  l)xT(t\t  -  Vp(t).  (  A.  10 i 

To  complete  the  development  we  will  assume  that,  given  Bt-\-  the  values  assumed 
by  x(t  +  1)  and  . V ( / )  are  independent  and.  consequently, 

E6-'[x(t  +  1  )-V(f )]  =  E8'-'x(t)EB-'N(t)  =  Q  T(t)x{t)xT(t)p(t).  (A. Ill 

But 

E*-'[x(t  +  1  )-V(<)]  =  E*-' EB,-'[x(t  +  1  )A’(f)]. 

therefore. 

£*-'[&(  tMO]  =  E*'-'\QT(t)x(t)xT(t)p(f)\  -  Qt(  t)x(  —  l)xT(t|f-  1)  pit) 


A -9 


V 


but.  since  x(t)  has  one  and  only  one  non-zero  component,  we  may  write 

x{t)xT(t)  =  diag  {x(t)} 

where  diag  {•}  denotes  a  diagonal  matrix  whose  diagonal  elements  are  composed  of 
the  elements  of  the  vector  argument.  Thus, 

ET'-l\n{t)v{t)\  =  QT(f)diag  |x(f|f-  1  )}p{t )  -  Qr(<)x(  t|t  -  1  )xT(t \t  -  1  )p(  t ).  (A.  12) 
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Calculation  of  Conditional  Variance 


The  conditional  variance  {v.v)t  consists  of  the  quantity 


where 


E*-l[v(t)v(t)}  =  {E^-M-V(t)  -  \(t\t  -  1))(.V(T)  -  Mt\t-  1))} 


\(t\t  -  1)  =  pT(t)x(t\t  -  V 


1  A.  13 ! 


is  the  conditional  expectati*  ol  \(t)  given  Pt-i-  Thus.  Er,-'v2(t)  may  be  easih 
evaluated  as 

Er'-lv2[t)  =  Ec'~'(X2{t)  -  2.V(t)A(i|t  -  1)  +  A2(f|f  -  11) 

=  A(t|t  -  1)  -  A2( t|<  -  1) 

since  X2(t)  =  X(t)  and  A(f|f  —1)6  E,_ j. 
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Appendix  B 

Envelope-squared  Detection 


I 


B.l  Envelope  of  a  Narrowband  Process 


A  random  signal  n(t)  is  said  to  be  a  narrowband  noise  process  if  the  spectrum  of 
n(t )  is  zero  except  for  a  narrow  region  about  u?  =  0  or  ui  =  jOc  where  uic  is  a  carrier 
frequency.  Let  us  represent  the  process  n(t)  as  a  pair  of  quadrature  components  as 
given  in  Whalen  ;36]  as  follows: 

n(t)  =  x(t)  cos  (u>ct)  —  y(t)  sin{u>ct)  (B.l) 

where  x(t)  and  y(t)  are  also  narrowband  processes  with  power  or  variance  a'.  We 
then  refer  to  the  envelope  of  the  process  n{t)  as 

z(t)  =  [x2(t)  +  y2{t)}7 .  (B.2) 


Now,  define  the  phase  angle  as 


a,..  .  y(t) 

8{t)  =  arctan - . 

x(t) 


The  inverse  operations  are  then 


( B.3) 


x{t)  =  z{t)cos8  y(/)  =  c(t)sin^. 


and  the  Jacobian  of  the  transformation  is  z(t).  The  joint  probability  density  function 
for  zlt)  and  8  is.  suppressing  the  time  arguments: 


.yvtv.w.yy.  v  vTyyyy  v wv.w  *>' yj* y  v  *>  y  *.y  *yy  *>'  t?  yy  *>  ?  v  ■>  *>  ■ 
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Since  the  phase  angle  is  uniformly  distributed  in  [0,27t],  we  integrate  Equation  B.4 
to  get 

r2n  -  T  -2  1 

(B.5 


_2 

'2a2 


c  >  0. 


p(~)  =  /  p(z,8)d8  =  — j  exp 

JO  (T 

This  is  the  well-known  Rayleigh  density  function.  However,  since  we  wish  to  employ 
the  square  of  the  envelope  in  a  detector,  we  now  let  u(t)  =  z2(t ),  and  the  density 
function  becomes 


M=^exp 


u 

2^2  j 


(B.6) 


This  is  an  exponential  distribution,  or  the  non-normalized  \2  distribution  with  two 
degrees  of  freedom.  If  we  define  the  normalized  variables 

x'(t)  =  ^  and  y'(t)  =  ^ 
a  <J 

then,  for  q  =  [ x a  +  y'2],  we  then  have  the  \2  distribution  with  two  degrees  of  freedom: 


P(<l)  = 

with  mean  and  variance  E{q}  —  2  and  V {q}  =  4. 


(B.7) 


B.2  Envelope  Squared  of  a  Sinusoid  plus  Narrow- 
band  Noise 

Consider  the  signal  model  now  to  be  a  sine  wave  with  additive  Gaussian  noise.  n(  t). 
so  that 

f(t)  =  Acos(u!ct  +  d>)  +  n{t ), 

or.  equivalently 

f(t)  =  [.4  cos  0  4-  x(  t)]  cos(u;c#)  -  [.4  sin o  4-  y(t) ]  sin( u.\.t ).  (  B.S ) 

where  0  is  uniformly  distributed  in  [0.  2tt j .  The  envelope  squared  is  then 

u(t)  =  [.4  cos  o  +  x(  t )]‘  4-  [.4  sin  0  4-  ij(  t  )]*. 

The  probability  density  function  for  u  is  thus,  supressing  the  time  arguments. 


B.O ' 


1 


/Hu)  =  exp 


-1-!  u  4-  .4  )2 
2a1 


B-2 


.4  u 1  /2 


B.10) 


v  dv  V  -.’  V  -A*  •■V%"  vVA‘\'/v^GV.C%V.^^,A,.V.S,.V.y.l.V.  .VA  .^A1  -  v  e,  . 


Equation  B.10  is  the  non-central  y2  distribution  with  two  degrees  of  freedom.  If  we 


now  normalize  the  variables,  as  done  above  in  Section  B.l,  we  have  a  new.  normalized 


non-central  y2  variable: 


(-d  -f  xf  (A  +  y)2 
a2  +  a2  ' 


(B.li; 


And  the  density  for  q  is  a  modified  form  of  Equation  B.10  as  follows: 


/>(<?)=  ^exp  to ((97)1/2) 


(B.12) 


where  7  =  2 A2 /a2  is  defined  as  the  non-central  parameter  for  two  degrees  of  freedom. 


B.3  DFT  Bin  Noise  Correlation 


Theorem: 


Given  a  white  Gaussian  noise  sequence,  n(t),  such  that  E{n(t)}  =  a 2,  and  E{n(t )n‘(s )} 
(J26u,  where 


6t,= 


1  for  t  =  s 


0  otherwise 


is  the  Kronecker  6  function,  then  E{N(k)Nm(l)}  =  cr28k,i  for  the  DFT  sequence  N{k) 


defined  as 


-v(fc)  =  -7W  Y  n(t). 

V'V  (=0 


Proof: 


The  autocorrelation  function  for  the  DFT  sequence  .V ( k )  is  given  by 


1  ,v-i 

1  _ — .  -,2rkt 


E{.\(k-)X‘(I) |  =  £{  —  Yi  '  v  «(*)  £  e  n'(s)}. 


,  V-  1  v- I 

1  x - .  t - ,  -,2-k!  .2-ris 


t  X  f 


E{  n(t)nm(s)} 


where  E{  n\  t)nml  s  ) }  =  because  the  process  n(t)  is  white.  Then.  Equation  B.14 


E \  X  \  k  1 1)  = 


ft1  TT 


=7! 


(  B .  1 5 1 


4 
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This  is  in  the  form  of  a  geometric  series  which  we  can  alternatively  express  as 

E{N(k)N*(l)}  =  tt  1  ~  6  (B-16) 

xV  1  -  e  * 

where  b  =  k  —  /.  Reducing  fractions, 

tj2,  1  —  e~22*b 

E{N(k)N-(l))  =  - - .  (B.17) 

iv  i  —  g  ?r 

The  quantity  b  can  only  assume  integer  values  0, 1, . . .  ,N  —  1.  Therefore,  from  the 

form  of  Equation  B.17  it  is  obvious  that,  since  exp[— j2irb]  =  1  for  b  =  1.2 . A  —  1. 

the  numerator  is  just  1  —  1=0  and  the  denominator  is  non-zero,  so 

E{N(k)N'(l)}  =  0  for  k  ±  l. 

However,  since  the  denominator  is  also  zero  when  k  =  /,  L'Hospital’s  rule  must  be 
used  to  resolve  the  indeterminate  form.  This  yields 

rr2  ir>irhp~]27rb 

E{N(k)N-(l) }  =  zgr  =  S  for  b  =  0  (or  k  =  (.) 

7v"e  ' 

The  autocorrelation  for  the  noise  in  different  bins  of  the  FFT  is  given  then  as: 

f  a2  for  k  =  l 

E{N(k)N'(l)}  =  ( B .  IS ) 

(,  0  for  k  ^  l. 

B.4  Detector  Structure 

In  the  following  discussion,  we  will  supress  all  time  arguments  for  the  sake  of  brevity. 
For  the  detector  design,  we  will  allow  two  hypotheses  for  each  signal  band.  or.  in  our 
case,  each  FFT  bin.  These  will  be  defined  for  each  FFT  bin  k.  as 

Hk'0-.Fk=Nk 
Hk,i-Fk—Ak  +  Nk 

where  Fk  is  the  A: th  bin  of  the  transform  of  /ft).  Ak  is  the  signal  magnitude  in  the  A-th 
bin.  and  Nk  is  the  noise  process  in  the  fcth  transform  bin.  Since  n(  t )  is  a  Gaussian 
process,  the  noise  in  the  various  FFT  bins  are  uncorrelated  as  shown  in  Section  B.3 
in  Equation  B.18.  This  allows  the  use  of  independent  detectors  in  our  application. 


B-4 


The  detector  then  tests  on  the  likelihood  of  the  above  two  hypotheses  and  yields  a 
decision  Dk(F)  as  follows: 


|  1  if  P'yHkMFklH^)  >  P{Hk,0)p(Fk\Hkv) 
[  0  otherwise 


(B.19) 


where  P(Hk,,)  is  the  a  priori  probability  for  the  ith  hypothesis  in  the  kth  FFT  bin. 
and  p(Fk\H k,i)  is  the  hypothesis-conditional  probability  density  function  the  form  of 
which  will  be  given  presently.  Equation  B.19  can  be  given  as  a  ratio  of  the  form: 


Dk(F) 


,  -r  P(HkA)p(Fk\HkA) 

1  11  P(Hk,0)p(Fk\Hk>0)  ^  1 

0  otherwise 


(B.20) 


or  as  a  log-likelihood  function 

(  1  if  log  P(Hk,\ )  +  log  p(Fk\Hk,i)  -  log  P(Hk,o)  -  logp(Fk\Hk,o)  >  0 

Dk(F)  = 

1  0  otherwise 

( B  .21 ) 


The  quantities  given  in  Equations  B.19  -  B.21  will  now  be  defined.  The  a  prion 
probabilities  for  the  binary  decision  case  are  given  as: 


P(Hk,  0)  =  1  —  Afc 
P(Hk,  i)  =  A*. 


(B.22) 


Since  the  FFT  results  in  a  complex  output  for  each  bin  k.  we  define  some  variables 
which  will  expedite  the  application  of  the  \2  density  functions.  Let 

xi,k  =  Real[iqt] 
x2,k  =  Imaginary  [iq..] . 


and 


qk  = 


'l.k 


+ 


''l.k 


And  the  hypothesis  conditional  densities  are  of  the  forms  given  in  Equations  B.7 
and  B.12.  For  Hk.o  we  assume  that  qk  is  drawn  from  a  central  \2  distribution  with 
probability  density  as  follows: 


And  for  Hk,  1  we  assume  that  qk  is  drawn  from  a  non-central  \;2  distribution  with 
probability  density  as  follows: 


PiQk)  =  -  exp 


~(k  +  <?fc 


Iottqrtkf2) 


(B.24) 


w 


here 


Ik  = 


2  Al 


When  the  quantities  given  in  Equations  B.23  and  B.24  are  substituted  into  Equation 
B.21,  we  arrive  at  the  log- likelihood  ratio  test  for  this  problem  presented  as 

1  if  log  \k  -  ^  4-  log( io [( 9A: Tfc ) 1  /2 ] )  -  log(l  -  \k)  >  0 
0  otherwise 


Dk(F)  =  { 


(B.25) 
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